Re: [Tutor] Text Processing Query

2013-03-14 Thread Prasad, Ramit
Spyros Charonis wrote:
> Hello Pythoners,
> 
> I am trying to extract certain fields from a file that whose text looks like 
> this:
> 
> COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
> COMPND   3 CHAIN: A, B;
> 
> COMPND  10 MOL_ID: 2;
> COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
> COMPND  12 CHAIN: D, F;
> COMPND  13 ENGINEERED: YES;
> COMPND  14 MOL_ID: 3;
> COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
> COMPND  16 CHAIN: E, G;
> 
> I would like the chain IDs, but only those following the text heading 
> "ANTIBODY FAB FRAGMENT", i.e. I
> need to create a list with D,F,E,G  which excludes A,B which have a 
> non-antibody text heading. I am
> using the following syntax:
> 
> with open(filename) as file:
>     scanfile=file.readlines()
>     for line in scanfile:
>         if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
>         elif line[0:6]=='COMPND' and 'CHAIN' in line:
>             print line

There is no reason to use readlines in this example, just
iterate over the file object directly. 

 with open(filename) as file:
 for line in file:
 if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
 elif line[0:6]=='COMPND' and 'CHAIN' in line:
 print line


> 
> But this yields:
> 
> COMPND   3 CHAIN: A, B;
> COMPND  12 CHAIN: D, F;
> COMPND  16 CHAIN: E, G;
> 
> I would like to ignore the first line since A,B correspond to non-antibody 
> text headings, and instead
> want to extract only D,F & E,G whose text headings are specified as antibody 
> fragments.
> 
> Many thanks,
> Spyros
> 

Will 'FAB FRAGMENT' always be the line before 'CHAIN'? 
If so, then just keep track of the previous line. 

>>> raw
'COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;\nCOMPND   3 
CHAIN: A, B;\nCOMPND  10 MOL_ID: 2;\nCOMPND  11 MOLECULE: \
ANTIBODY FAB FRAGMENT LIGHT CHAIN;\nCOMPND  12 CHAIN: D, F;\nCOMPND  13 
ENGINEERED: YES;\nCOMPND  14 MOL_ID: 3;\nCOMPND  15 MOLECULE\
: ANTIBODY FAB FRAGMENT HEAVY CHAIN;\nCOMPND  16 CHAIN: E, G;'

>>> prev = ''
>>> chains = []
>>> for line in raw.split('\n'):
... if 'COMPND' in prev and 'FAB FRAGMENT' in prev and 'CHAIN' in line:
... chains.extend( 
line.split(':')[1].replace(',','').replace(';','').split())
... prev = line
... 
>>> chains
['D', 'F', 'E', 'G']


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Text Processing Query

2013-03-14 Thread Spyros Charonis
Yes, the elif line need to have **flag_FAB ==1** as is conidition instead
of **flag_FAB=1**. So:


for line in scanfile:

if line[0:6]=='COMPND' and 'FAB' in line: flag_FAB = 1

elif line[0:6]=='COMPND' and 'CHAIN' in line and flag_FAB == 1:

print line

flag_FAB = 0


On Thu, Mar 14, 2013 at 4:33 PM, Mark Lawrence wrote:

> On 14/03/2013 11:28, taserian wrote:
>
> Top posting fixed
>
>
>> On Thu, Mar 14, 2013 at 6:56 AM, Spyros Charonis > > wrote:
>>
>> Hello Pythoners,
>>
>> I am trying to extract certain fields from a file that whose text
>> looks like this:
>>
>> COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
>> COMPND   3 CHAIN: A, B;
>> COMPND  10 MOL_ID: 2;
>> COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
>> COMPND  12 CHAIN: D, F;
>> COMPND  13 ENGINEERED: YES;
>> COMPND  14 MOL_ID: 3;
>> COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
>> COMPND  16 CHAIN: E, G;
>>
>> I would like the chain IDs, but only those following the text
>> heading "ANTIBODY FAB FRAGMENT", i.e. I need to create a list with
>> D,F,E,G  which excludes A,B which have a non-antibody text heading.
>> I am using the following syntax:
>>
>> with open(filename) as file:
>>
>>  scanfile=file.readlines()
>>
>>  for line in scanfile:
>>
>>  if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
>>
>>  elif line[0:6]=='COMPND' and 'CHAIN' in line:
>>
>>  print line
>>
>>
>> But this yields:
>>
>> COMPND   3 CHAIN: A, B;
>> COMPND  12 CHAIN: D, F;
>> COMPND  16 CHAIN: E, G;
>>
>> I would like to ignore the first line since A,B correspond to
>> non-antibody text headings, and instead want to extract only D,F &
>> E,G whose text headings are specified as antibody fragments.
>>
>> Many thanks,
>> Spyros
>>
>> Since the identifier and the item that you want to keep are on different
>> lines, you'll need to set a "flag".
>>
>> with open(filename) as file:
>>
>>  scanfile=file.readlines()
>>
>>  flag = 0
>>
>>  for line in scanfile:
>>
>>  if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: flag = 1
>>
>>  elif line[0:6]=='COMPND' and 'CHAIN' in line and flag = 1:
>>
>>  print line
>>
>>  flag = 0
>>
>>
>> Notice that the flag is set to 1 only on "FAB FRAGMENT", and it's reset
>> to 0 after the next "CHAIN" line that follows the "FAB FRAGMENT" line.
>>
>>
>> AR
>>
>>
>>
> Notice that this code won't run due to a syntax error.
>
> --
> Cheers.
>
> Mark Lawrence
>
>
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/**mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Text Processing Query

2013-03-14 Thread Mark Lawrence

On 14/03/2013 11:28, taserian wrote:

Top posting fixed



On Thu, Mar 14, 2013 at 6:56 AM, Spyros Charonis mailto:s.charo...@gmail.com>> wrote:

Hello Pythoners,

I am trying to extract certain fields from a file that whose text
looks like this:

COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
COMPND   3 CHAIN: A, B;
COMPND  10 MOL_ID: 2;
COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
COMPND  12 CHAIN: D, F;
COMPND  13 ENGINEERED: YES;
COMPND  14 MOL_ID: 3;
COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
COMPND  16 CHAIN: E, G;

I would like the chain IDs, but only those following the text
heading "ANTIBODY FAB FRAGMENT", i.e. I need to create a list with
D,F,E,G  which excludes A,B which have a non-antibody text heading.
I am using the following syntax:

with open(filename) as file:

 scanfile=file.readlines()

 for line in scanfile:

 if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue

 elif line[0:6]=='COMPND' and 'CHAIN' in line:

 print line


But this yields:

COMPND   3 CHAIN: A, B;
COMPND  12 CHAIN: D, F;
COMPND  16 CHAIN: E, G;

I would like to ignore the first line since A,B correspond to
non-antibody text headings, and instead want to extract only D,F &
E,G whose text headings are specified as antibody fragments.

Many thanks,
Spyros

Since the identifier and the item that you want to keep are on different
lines, you'll need to set a "flag".

with open(filename) as file:

 scanfile=file.readlines()

 flag = 0

 for line in scanfile:

 if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: flag = 1

 elif line[0:6]=='COMPND' and 'CHAIN' in line and flag = 1:

 print line

 flag = 0


Notice that the flag is set to 1 only on "FAB FRAGMENT", and it's reset
to 0 after the next "CHAIN" line that follows the "FAB FRAGMENT" line.


AR




Notice that this code won't run due to a syntax error.

--
Cheers.

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Text Processing Query

2013-03-14 Thread Mitya Sirenef

On 03/14/2013 07:28 AM, taserian wrote:
Since the identifier and the  item that you want to keep are on different lines, you'll need to set 

a "flag".
>
> with open(filename) as file:
>
> scanfile=file.readlines()
>
> flag = 0
>
> for line in scanfile:
>
> if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: flag = 1
>
> elif line[0:6]=='COMPND' and 'CHAIN' in line and flag = 1:
>
> print line
>
> flag = 0
>
>
> Notice that the flag is set to 1 only on "FAB FRAGMENT", and it's 
reset to 0 after the next "CHAIN" line that follows the "FAB FRAGMENT" line.



I would simplify this a bit as follows:

flag = 0

for line in scanfile:
if line.strip():
if 'FAB FRAGMENT' in line:
flag = 1
elif flag:
print line
flag = 0

This assumes CHAIN line always follows MOLECULE line (otherwise elif
needs to check for CHAIN as well), it also ignores blank lines.

 -m



--
Lark's Tongue Guide to Python: http://lightbird.net/larks/

It is pleasant at times to play the madman.
Seneca

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Text Processing Query

2013-03-14 Thread Bod Soutar
On 14 March 2013 10:56, Spyros Charonis  wrote:
> Hello Pythoners,
>
> I am trying to extract certain fields from a file that whose text looks like
> this:
>
> COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
> COMPND   3 CHAIN: A, B;
> COMPND  10 MOL_ID: 2;
> COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
> COMPND  12 CHAIN: D, F;
> COMPND  13 ENGINEERED: YES;
> COMPND  14 MOL_ID: 3;
> COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
> COMPND  16 CHAIN: E, G;
>
> I would like the chain IDs, but only those following the text heading
> "ANTIBODY FAB FRAGMENT", i.e. I need to create a list with D,F,E,G  which
> excludes A,B which have a non-antibody text heading. I am using the
> following syntax:
>
> with open(filename) as file:
>
> scanfile=file.readlines()
>
> for line in scanfile:
>
> if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
>
> elif line[0:6]=='COMPND' and 'CHAIN' in line:
>
> print line
>
>
> But this yields:
>
> COMPND   3 CHAIN: A, B;
> COMPND  12 CHAIN: D, F;
> COMPND  16 CHAIN: E, G;
>
> I would like to ignore the first line since A,B correspond to non-antibody
> text headings, and instead want to extract only D,F & E,G whose text
> headings are specified as antibody fragments.
>
> Many thanks,
> Spyros
>
>
>

This is how I would do it.

with open(filename) as file:
scanfile = file.readlines()
wanted = "CHAIN:"
unwanted = [" A", " B"]
for line in scanfile:
for item in unwanted:
if item not in line and wanted in line:
print line

HTH,
Bodsda
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Text Processing Query

2013-03-14 Thread taserian
Since the identifier and the item that you want to keep are on different
lines, you'll need to set a "flag".

with open(filename) as file:

scanfile=file.readlines()

flag = 0

for line in scanfile:

if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: flag = 1

elif line[0:6]=='COMPND' and 'CHAIN' in line and flag = 1:

print line

flag = 0


Notice that the flag is set to 1 only on "FAB FRAGMENT", and it's reset to
0 after the next "CHAIN" line that follows the "FAB FRAGMENT" line.


AR


On Thu, Mar 14, 2013 at 6:56 AM, Spyros Charonis wrote:

> Hello Pythoners,
>
> I am trying to extract certain fields from a file that whose text looks
> like this:
>
> COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
>
> COMPND   3 CHAIN: A, B;
>
> COMPND  10 MOL_ID: 2;
>
> COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
>
> COMPND  12 CHAIN: D, F;
>
> COMPND  13 ENGINEERED: YES;
>
> COMPND  14 MOL_ID: 3;
>
> COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
>
> COMPND  16 CHAIN: E, G;
>
> I would like the chain IDs, but only those following the text heading
> "ANTIBODY FAB FRAGMENT", i.e. I need to create a list with D,F,E,G  which
> excludes A,B which have a non-antibody text heading. I am using the
> following syntax:
>
> with open(filename) as file:
>
> scanfile=file.readlines()
>
> for line in scanfile:
>
> if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
>
> elif line[0:6]=='COMPND' and 'CHAIN' in line:
>
> print line
>
> But this yields:
>
> COMPND   3 CHAIN: A, B;
>
> COMPND  12 CHAIN: D, F;
>
> COMPND  16 CHAIN: E, G;
>
> I would like to ignore the first line since A,B correspond to non-antibody
> text headings, and instead want to extract only D,F & E,G whose text
> headings are specified as antibody fragments.
>
> Many thanks,
> Spyros
>
>
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor