On 13/06/14 00:19, Ethan A Merritt wrote:
Earlier this year for the first time I got back a validation report from the
PDB for a deposited structure that included wwPDB validation of a ligand.
This is great stuff. I approve. I am happy.

Unfortunately the validation check reported problems with my ligand.
This is bad. I am unhappy.  What went wrong?

A long story follows.  Skip to the end for the TL;DNR summary.

Basically I am advocating to treat errors, omissions, or inadequacies
in the CCP4 ligand dictionaries as bugs in exactly the same sense
as program bugs.   Report them when you find them, get them fixed
in CCP4 updates, and down the road we will all have better structures.

==== Long version ====

Since last year I have been happily using the integrated tools Coot,
Jligand, cprodrg, and refmac5 to sketch a ligand, generate a dictionary,
fit to initial difference density, and refine.  In the absence of an independent
validation check, I thought everything was working acceptably.
The bad grade on my wwPDB validation report [pun intended] made
me look into the guts of this tool chain more carefully trying to see
what went wrong.

In short here is what happens:

- coot fires up jligand

- I sketch the compound and click "accept"

- jligand creates a file prodrg-in.mdl that contains only
   atom type, connectivity, single/double bond flag

- cprodrg takes this and assigns each atom a more complete
   chemical label, for example
        O  15.9994      CARBONYL OXYGEN (C=O)
        CH2  12.011       ALIPHATIC CH2-GROUP
        NR  14.0067      AROMATIC NITROGEN

- cprodrg then categorizes each bond by the assigned types of
   the two bonded atoms, and similarly categorizes each bond angle
   by the assigned types of its three constituent atoms.

So far so good.  Now comes the problematic part.

- cprodrg tries to find a target geometry (ideal bond length or angle)
   for each category by matching against the contents of the file
   .../Prodrg/param/ff/default
  If an exact match is not found, it falls through to ... well I'm not
  sure exactly what the rule is for falling through.  This is the part
  that goes wrong.

The content of this default parameter file is rather impoverished.
My ligand contained a pyrazole  (5-membered ring with 2 adjacent
nitrogens).  The nitrogens were assigned a category
        NR5  14.0067      NITROGEN (5-RING)
But the default file contains no bond or angle entries for this
atom type, so it "falls through" to the only N-N bond it does contain
        NSP - NSP  target length 1.12Å
That's miles off, or at any rate more than 1/3 Å  off, the expected
length of 1.396Å tabulated in the Mogul database for a pyrazole.
(The wwPDB report listed a target of 1.37Å).

I don't expect perfection, but target errors of more than 0.3Å in
bond length are large compared to the expected accuracy of even
a modest resolution protein structure.  No wonder the wwPDB
validation report flagged it as a 13 sigma outlier in the refined
structure.

==== TL;DNR version ====

The $CCP4/share/prodrg/prodrg.param file does not contain
target values for many bond types that are correctly identified
by prodrg itself.

Adding a single line handling NR5-NR5 bonds to the source file
ccp4-6.4.0/src/Prodrg/param/ff/default
yields a significant improvement in my refined protein structure.
Even the R/Rfree are improved, which surprised me.
These were identical runs except for the regenerated ligand dictionary.

Would it be appropriate to report this as a bug?    I think so.

Where should I report it?

Dear Ethan,

This is a known problem with cprodrg - no need to report it. Cprodrg will undergo little to no maintainance from now on.

FWIW, my CCP4 colleagues are throwing their weight behind ACEDRG - which should be available in the next CCP4 release.

As an aside, I wonder how you got Coot to fire up JLigand without a link (and I thought that jligand used libcheck as its backend). I wonder if you instead meant Coot's lbg.

In the mean-time you can use the Coot's Mogul plug-in to update the restraint information from cprodrg. Or of course, Just Use Grade (as you imply :-).

Regards,

Paul.

Reply via email to