***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***


> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
> Behalf Of Joel Bard
> Sent: 20 January 2006 00:15
> To: [email protected]; [EMAIL PROTECTED]
> Subject: Re: [ccp4bb]: electron density map & Pymol

<snip>

> # generate a pdb with just our ligand
> awk '$1 == "ATOM" && $5 == "'$ligID'"' $pdbin > ligmap_lig${ligID}.pdb

This is a potentially error-prone way of extracting lines from a PDB
file, because the PDB record format is inherently not free (i.e.
space-separated with no data allowed to be blank; rather the data are in
fixed columns).  awk is designed to process free-format data so is
totally unsuited to the task.  Consider for example how the above script
would handle the following cases:

HETATM 1234  CD1 LIG X   1      -2.389 -15.315  14.234 0.537 10.66
(PDB requires non-amino acids to be keyed as HETATM)

HETATM12345  CD1 LIG X   1      -2.389 -15.315  14.234 0.537 10.66
(Atom number has 5 digits)

ATOM         CD1 LIG X   1      -2.389 -15.315  14.234 0.537 10.66
(Atom number column is blank)

ATOM   1234  CD1ALIG X   1      -2.389 -15.315  14.234 0.537 10.66
(Alternate location indicator)

ATOM   1234  CD1 LIG     1      -2.389 -15.315  14.234 0.537 10.66
(Chain ID is blank)

ATOM   1234  CD1 LIG X1234      -2.389 -15.315  14.234 0.537 10.66
(Residue number has 4 digits)

... and of course all combinations of the above!  Admittedly 1 or 2 of
these examples that contain blank columns break the rules but most are
valid (but then people who write programs that output PDB files are not
known for sticking to the rules!).

Unix provides a versatile utility for handling fixed-format records that
many people don't seem to be aware of, namely egrep (or grep -E), i.e.
grep with extended regular expressions.  The following command will
handle all of the above cases very neatly, and is easily generalised to
perform similar tasks:

egrep  "^(ATOM  |HETATM).{15}$ligID"  $pdbin

-- Ian

Disclaimer

This communication is confidential and may contain privileged information 
intended solely for the named addressee(s). It may not be used or disclosed 
except for the purpose for which it has been sent. If you are not the intended 
recipient you must not review, use, disclose, copy, distribute or take any 
action in reliance upon it. If you have received this communication in error, 
please notify Astex Therapeutics Ltd by emailing [EMAIL PROTECTED] and destroy 
all copies of the message and any attached documents. 



Astex Therapeutics Ltd monitors, controls and protects all its messaging 
traffic in compliance with its corporate email policy. The Company accepts no 
liability or responsibility for any onward transmission or use of emails and 
attachments having left the Astex Therapeutics domain.  Unless expressly 
stated, opinions in this message are those of the individual sender and not of 
Astex Therapeutics Ltd. The recipient should check this email and any 
attachments for the presence of computer viruses. Astex Therapeutics Ltd 
accepts no liability for damage caused by any virus transmitted by this email. 
E-mail is susceptible to data corruption, interception, unauthorized amendment, 
and tampering, Astex Therapeutics Ltd only send and receive e-mails on the 
basis that the Company is not liable for any such alteration or any 
consequences thereof.



Reply via email to