***  For details on how to be removed from this list visit the  ***
***          CCP4 home page http://www.ccp4.ac.uk         ***




Ian Tickle wrote:

This is a potentially error-prone way of extracting lines from a PDB file, because the PDB record format is inherently not free (i.e. space-separated with no data allowed to be blank; rather the data are in fixed columns). awk is designed to process free-format data so is totally unsuited to the task.

even for small pdb files containing heme, the original awk
expression is useful because some atom names contain spaces:

HETATM 8614 FE   HEM   101       0.407  31.517  78.648  1.00 35.55
HETATM 8615  CHA HEM   101       2.786  31.976  81.117  1.00 37.07
HETATM 8619  N A HEM   101       2.012  32.838  78.920  1.00 36.69
HETATM 8630  N B HEM   101       0.307  32.155  76.784  1.00 37.34
HETATM 8638  N C HEM   101      -1.283  30.423  78.355  1.00 37.97
HETATM 8646  N D HEM   101       0.615  30.975  80.458  1.00 36.88

But if you are on Linux awk is probably gawk, and gawk has a
"FIELDWIDTHS" option that lets you keep the old syntax but separate
fields by fixed width rather than field-separator charactor:

set ligID=HEM
gawk '$1 == "ATOM" && $5 == "'$ligID'"'\
 FIELDWIDTHS="6 5 5 4 2 4 4 8 8 8" pdbin.pdb > pdbout.pdb

Note <FIELDWIDTHS ""> comes outside the awk expression.




Reply via email to