Re: [ccp4bb] Problematic PDBs

2013-11-17 Thread MARTYN SYMMONS
Hmm
So the backstory for the problematic ligand R12 in this thread is that a 
sharp-eyed worker at the wwPDB recently spotted that there was an error 
compared with the original 1999 paper. Correcting the R12 ligand is a friendly 
gesture from the PDB as it appears that the error must have been the authors' - 
the atom correcting the R12 ligand has been inserted by the PDB staff rather 
than retreived from a deposited structure.
 
It is a shame the same helpful approach is not always applied. One current 
example I spotted is a separate 'problematic ligand' 5AX which has been added 
by the wwPDB to at least four other authors' entries, starting in 2006 with the 
latest in 2009. 
 
5AX is basically a fragment ligand which the PDB software produces if a NAG has 
wandered too far from its Asn sidechain during refinement. If 5AX is generated 
during the PDB processing of a deposition, then it really should be highlighted 
for the authors as a geometric issue - rather than, as in these cases, being 
simply added to the coordinates. 
 
Reading the authors' papers for the 5AX-containing entries makes it clear that 
they never expected anything other than NAG to appear in their deposited 
coordinates.
 
And given its artifactual production during deposition, 5AX should never have 
'escaped into the wild'. 
 
So if a retrospective fix can be applied to R12 (which similar in lacking an 
atom) then it seems to me that, in fairness, a clean up of the 5AX entries 
should be arranged. 
 
Yours (not holding his breath),
Martyn
 
 

  


 From: Rachel Kramer Green kra...@rcsb.rutgers.edu
To: CCP4BB@JISCMAIL.AC.UK 
Sent: Wednesday, 6 November 2013, 16:49
Subject: Re: [ccp4bb] Problematic PDBs
  


Dear Martyn,

wwPDB staff regularly reviews and remediates PDB data and related
  dictionaries such as the Chemical Component Dictionary (CCD).

As part of our on-going remediation efforts, the chemical
  components in the archive are regularly reviewed to ensure the
  correctness and the completeness of the chemical representation.
  Such reviews show that in some cases, the author has failed to
  provide a complete description of the chemistry. To address any
  such errors, the definitions are corrected. The chemical name and
  formula are changed in the PDB file, but the coordinates are not
  changed.

In the case of entry 3CBS, issues were found with the chemical
  component definition for its ligand R12. The methyl group was not
  in the deposited coordinates and it was missing from the original
  definition. In addition, the bond order in one of the
  carbon-carbon bonds was incorrectly defined. The CCD definition
  for R12 was updated in 2011 to add the methyl group and to correct
  the bond order based on information in the primary citation. The
  coordinates for this PDB entry were not changed. Therefore, in
  accordance with wwPDB policy, the file was not obsoleted.

Sincerely,
Rachel Green


 


  
Rachel Kramer Green, Ph.D. 
RCSB PDB 
kra...@rcsb.rutgers.edu 
  
  
Twitter: https://twitter.com/#!/buildmodels 
Facebook: http://www.facebook.com/RCSBPDB 
   
On 10/21/2013 6:28 AM, MARTYN SYMMONS wrote:
 
As a postscript it might be worth mentioning one problematic ligand that 
suggested to me a way to correct some of the errors mentioned in this thread 

R12 is indicated as 9-(4-HYDROXY-2,6-DIMETHYL-PHENYL)-3 in the  most 
recent Coot monomer library. But in the PDB ligand description it is 
9-(4-hydroxy-2,3,6-trimethylphenyl)-3,7-dimethylnona-2,4,6,8-tetraenoic acid 
with an additional carbon C16. To make a long story short this ligand was 
originally deposited missing this extra methyl goup in 1999 (as part of 3CBS) 
and then apparently updated in 2011 by the PDB. 

 
(the relevant lines in the cif are 
snip 
R12 C16 C16 C 0 1 N N N ?      ?      ?      -6.631 1.502  0.990  C16 R12 44  
R12 H1  H1  H 0 1 N N N ?      ?      ?      -6.602 1.511  2.080  H1  R12 45  
R12 H23 H23 H 0 1 N N N ?      ?      ?      -6.422 2.503  0.613  H23 R12 46  
R12 H24 H24 H 0 1 N N N ?      ?      ?      -7.619 1.186  0.656  H24 R12 47  
snip  

 
with the ? ? ? indicating that refined coordinates were not available at the 
time of the update. There was initially an explanation line at the end of the 
cif: 

 
snip 
R12 Other modification 2011-10-25 RCSB CS 'add missing methyl group, 
re-define bond order based on publication' 
snip 

 
But this has mutated for some reason (premature stop codon?) over the past 
year to the following. 

 
snip   
R12 Other modification 2011-10-25 RCSB  
snip 

 
Obviously the full correct ligand could not have been incorporated into the 
PDB entry coordinates without these undergoing a full obsolete - supersede 
process (somewhat embarrassing perhaps as one author is now a wwPDB PI ;) 

 
But it is frustrating for users of the PDB that in such cases easily 
correctable

Re: [ccp4bb] Problematic PDBs

2013-10-21 Thread MARTYN SYMMONS
As a postscript it might be worth mentioning one problematic ligand that 
suggested to me a way to correct some of the errors mentioned in this thread
 
R12 is indicated as 9-(4-HYDROXY-2,6-DIMETHYL-PHENYL)-3 in the  most recent 
Coot monomer library. But in the PDB ligand description it is 
9-(4-hydroxy-2,3,6-trimethylphenyl)-3,7-dimethylnona-2,4,6,8-tetraenoic acid 
with an additional carbon C16. To make a long story short this ligand was 
originally deposited missing this extra methyl goup in 1999 (as part of 3CBS) 
and then apparently updated in 2011 by the PDB.

(the relevant lines in the cif are
snip
R12 C16 C16 C 0 1 N N N ?      ?      ?      -6.631 1.502  0.990  C16 R12 44 
R12 H1  H1  H 0 1 N N N ?      ?      ?      -6.602 1.511  2.080  H1  R12 45 
R12 H23 H23 H 0 1 N N N ?      ?      ?      -6.422 2.503  0.613  H23 R12 46 
R12 H24 H24 H 0 1 N N N ?      ?      ?      -7.619 1.186  0.656  H24 R12 47 
snip 

with the ? ? ? indicating that refined coordinates were not available at the 
time of the update. There was initially an explanation line at the end of the 
cif:

snip
R12 Other modification 2011-10-25 RCSB CS 'add missing methyl group, 
re-define bond order based on publication'
snip

But this has mutated for some reason (premature stop codon?) over the past year 
to the following.

snip  
R12 Other modification 2011-10-25 RCSB 
snip

Obviously the full correct ligand could not have been incorporated into the PDB 
entry coordinates without these undergoing a full obsolete - supersede process 
(somewhat embarrassing perhaps as one author is now a wwPDB PI ;)

But it is frustrating for users of the PDB that in such cases easily 
correctable errors are not actually updated by the authors. Would it not be 
helpful if there were a mechanism to make and track useful improvements in 
deposited structures? - Perhaps suggested by members of the community to the 
authors. 

These changes could be considered as 'corrigenda' and could be documented and 
tracked - complete with an explanation of the reasoning behind the change and 
attributing the motivation and origin of the improvement.

This would be a good way for the wider scientific community (who maybe do not 
read this bulletin board) to access the best current model without the authors 
suffering the full process of retracting and redepositing their PDB entry. The 
test for obsoleting would then be the same as for a paper - that the change 
invalidates a fundamental interpretation of the data. 

All the best
  Martyn 



 From: Pavel Afonine pafon...@gmail.com
To: CCP4BB@JISCMAIL.AC.UK 
Sent: Sunday, 20 October 2013, 19:49
Subject: Re: [ccp4bb] Problematic PDBs
 


Hello,

just for the sake of completeness: this paper lists a bunch of known 
pathologies (I would not be surprised if they've been remediated by now):

http://www.phenix-online.org/papers/he5476_reprint.pdf


Pavel



On Thu, Oct 17, 2013 at 6:51 AM, Lucas lucasbleic...@gmail.com wrote:

Dear all,

I've been lecturing in a structural bioinformatics course where graduate 
students (always consisting of people without crystallography background to 
that point) are expected to understand the basics on how x-ray structures are 
obtained, so that they know what they are using in their bioinformatics 
projects. Practices include letting them manually build a segment from an 
excellent map and also using Coot to check problems in not so good structures.

I wonder if there's a list of problematic structures somewhere that I could 
use for that practice? Apart from a few ones I'm aware of because of (bad) 
publicity, what I usually do is an advanced search on PDB for entries with 
poor resolution and bound ligands, then checking then manually, hopefully 
finding some examples of creative map interpretation. But it would be nice to 
have specific examples for each thing that can go wrong in a PDB construction.

Best regards,
Lucas


Re: [ccp4bb] Problematic PDBs

2013-10-20 Thread Pavel Afonine
Hello,

just for the sake of completeness: this paper lists a bunch of known
pathologies (I would not be surprised if they've been remediated by now):

http://www.phenix-online.org/papers/he5476_reprint.pdf

Pavel


On Thu, Oct 17, 2013 at 6:51 AM, Lucas lucasbleic...@gmail.com wrote:

 Dear all,

 I've been lecturing in a structural bioinformatics course where graduate
 students (always consisting of people without crystallography background to
 that point) are expected to understand the basics on how x-ray structures
 are obtained, so that they know what they are using in their bioinformatics
 projects. Practices include letting them manually build a segment from an
 excellent map and also using Coot to check problems in not so good
 structures.

 I wonder if there's a list of problematic structures somewhere that I
 could use for that practice? Apart from a few ones I'm aware of because of
 (bad) publicity, what I usually do is an advanced search on PDB for entries
 with poor resolution and bound ligands, then checking then manually,
 hopefully finding some examples of creative map interpretation. But it
 would be nice to have specific examples for each thing that can go wrong in
 a PDB construction.

 Best regards,
 Lucas



Re: [ccp4bb] Problematic PDBs

2013-10-18 Thread Bernhard Rupp
.and there is always the twilight collection and the gems shown in the
associated paper:

 

http://www.ruppweb.org/twilight/default.htm

 

Best, BR

 

From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of
Alessandro Nascimento
Sent: Donnerstag, 17. Oktober 2013 23:22
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] Problematic PDBs

 

Hi Lucas, 

 

this book
(http://www.amazon.com/Structural-Bioinformatics-Methods-Biochemical-Analysi
s/dp/0471201995/ref=sr_1_2?s=books
http://www.amazon.com/Structural-Bioinformatics-Methods-Biochemical-Analysi
s/dp/0471201995/ref=sr_1_2?s=booksie=UTF8qid=1382044405sr=1-2keywords=st
ructural+bioinformatics
ie=UTF8qid=1382044405sr=1-2keywords=structural+bioinformatics) brings
nice examples of protein structures with unusual features in the structure
validation chapter . I used it on my protein modelingcourse and it is
definitely worth buying.

 

I small list taken from the book (unless I am very much mistaken) includes
these structures: 

 

1. 2ABX

2. 1GMA

3. 1CYC

4. 3PGM

5. 1CTX

6. 2GN5

7. 2ATC

8. 1PYP

9. 4RCR

10. 1TRC

 

 

HTH,

 

--asn




[ ]s

--alessandro

 

2013/10/17 Lucas lucasbleic...@gmail.com

Dear all,

I've been lecturing in a structural bioinformatics course where graduate
students (always consisting of people without crystallography background to
that point) are expected to understand the basics on how x-ray structures
are obtained, so that they know what they are using in their bioinformatics
projects. Practices include letting them manually build a segment from an
excellent map and also using Coot to check problems in not so good
structures.

I wonder if there's a list of problematic structures somewhere that I could
use for that practice? Apart from a few ones I'm aware of because of (bad)
publicity, what I usually do is an advanced search on PDB for entries with
poor resolution and bound ligands, then checking then manually, hopefully
finding some examples of creative map interpretation. But it would be nice
to have specific examples for each thing that can go wrong in a PDB
construction.

Best regards,
Lucas

 



Re: [ccp4bb] Problematic PDBs

2013-10-17 Thread Nat Echols
On Thu, Oct 17, 2013 at 6:51 AM, Lucas lucasbleic...@gmail.com wrote:

 I wonder if there's a list of problematic structures somewhere that I
 could use for that practice? Apart from a few ones I'm aware of because of
 (bad) publicity, what I usually do is an advanced search on PDB for entries
 with poor resolution and bound ligands, then checking then manually,
 hopefully finding some examples of creative map interpretation. But it
 would be nice to have specific examples for each thing that can go wrong in
 a PDB construction.


This would be a good place to start:

http://www.ncbi.nlm.nih.gov/pubmed/23385452

The retracted ABC transporter structures are also good, although less
obvious to the untrained eye.  I forget what the PDB IDs are but I'll see
if I can dig them up.

-Nat


Re: [ccp4bb] Problematic PDBs

2013-10-17 Thread Phil Jeffrey

From the original ABC transporter retraction:
http://www.sciencemag.org/content/314/5807/1875.2.full

The Protein Data Bank (PDB) files 1JSQ, 1PF4, and 1Z2R for MsbA and 
1S7B and 2F2M for EmrE have been moved to the archive of obsolete PDB 
entries


You can get your hands on them via URLs like:
ftp://ftp.rcsb.org/pub/pdb/data/structures/obsolete/XML/js/1jsq.xml.gz‎

Phil Jeffrey
Princeton

On 10/17/13 10:26 AM, Nat Echols wrote:

On Thu, Oct 17, 2013 at 6:51 AM, Lucas lucasbleic...@gmail.com
mailto:lucasbleic...@gmail.com wrote:

I wonder if there's a list of problematic structures somewhere that
I could use for that practice? Apart from a few ones I'm aware of
because of (bad) publicity, what I usually do is an advanced search
on PDB for entries with poor resolution and bound ligands, then
checking then manually, hopefully finding some examples of creative
map interpretation. But it would be nice to have specific examples
for each thing that can go wrong in a PDB construction.


This would be a good place to start:

http://www.ncbi.nlm.nih.gov/pubmed/23385452

The retracted ABC transporter structures are also good, although less
obvious to the untrained eye.  I forget what the PDB IDs are but I'll
see if I can dig them up.

-Nat


Re: [ccp4bb] Problematic PDBs

2013-10-17 Thread Ethan A Merritt
On Thursday, 17 October, 2013 10:51:08 Lucas wrote:
 Dear all,
 
 I've been lecturing in a structural bioinformatics course where graduate
 students (always consisting of people without crystallography background to
 that point) are expected to understand the basics on how x-ray structures
 are obtained, so that they know what they are using in their bioinformatics
 projects. Practices include letting them manually build a segment from an
 excellent map and also using Coot to check problems in not so good
 structures.
 
 I wonder if there's a list of problematic structures somewhere that I could
 use for that practice?

4KAP is a nice cautionary example of failing to properly refine a ligand
after placement.   

- Open coot, download 4KAP + map from EDS.  
- Navigate to ligand and view difference density map.   
- Oops.
- Now open up residue information for the ligand.  Notice anything odd?

For bonus points, look up the known ligation chemistry of this site.
Notice that the binding pose of the 4KAP ligand does not match it.

Ethan

 Apart from a few ones I'm aware of because of (bad)
 publicity, what I usually do is an advanced search on PDB for entries with
 poor resolution and bound ligands, then checking then manually, hopefully
 finding some examples of creative map interpretation. But it would be nice
 to have specific examples for each thing that can go wrong in a PDB
 construction.
 
 Best regards,
 Lucas


Re: [ccp4bb] Problematic PDBs

2013-10-17 Thread Dale Tronrud
   I would start with 1E4M (residue 361 of chain M) and 1QW9 (170 of
chain B).  First show the model and then reveal the electron density.
This promotes a healthy skepticism of PDB models and enforces the
importance of always looking at a model in the context of the map.

   For model building I would recommend 2PWJ and 3SQK.  In 3SQK the
linker to the His tag in chain B was built using the wrong sequence.
It is fairly easy to build a sequence into the density and then
recognize what the linker actually is.  In 2PWJ the wrong sequence was
used up to residue 31.  I've never been able to figure out how this
error came to be.  Some horrible, horrible mistake was made when
sequencing the gene and the person who built the model believed the
sequence more than the density.  The model building required to correct
2PWJ is more challenging since a number of short cuts were made
cutting out loops.  If I recall, my model has about 10 more amino acids
than the PDB model.

In all of these cases the majority of the resides in each model are
fine.  3SQK has been replaced with a corrected model (4F4J).

Dale Tronrud

On 10/17/2013 06:51 AM, Lucas wrote:
 Dear all,
 
 I've been lecturing in a structural bioinformatics course where graduate
 students (always consisting of people without crystallography background
 to that point) are expected to understand the basics on how x-ray
 structures are obtained, so that they know what they are using in their
 bioinformatics projects. Practices include letting them manually build a
 segment from an excellent map and also using Coot to check problems in
 not so good structures.
 
 I wonder if there's a list of problematic structures somewhere that I
 could use for that practice? Apart from a few ones I'm aware of because
 of (bad) publicity, what I usually do is an advanced search on PDB for
 entries with poor resolution and bound ligands, then checking then
 manually, hopefully finding some examples of creative map
 interpretation. But it would be nice to have specific examples for each
 thing that can go wrong in a PDB construction.
 
 Best regards,
 Lucas


Re: [ccp4bb] Problematic PDBs

2013-10-17 Thread Tanner, John J.
I use 2QNS for teaching. It is an egregious case of modeling ligand into noise. 
 Also, the structure has many close contacts (e.g. HOH A351),  poor 
stereochemistry (e.g. A58-A61), and incorrectly built water.  Turn on symmetry 
to see the steric clash of the peptide ligand with itself.  You can get the 
coordinates and maps from EDS.

http://www.ncbi.nlm.nih.gov/pubmed/18611381
http://www.ncbi.nlm.nih.gov/pubmed/21827955
http://retractionwatch.wordpress.com/2011/08/16/ties-that-dont-bind-group-retracts-parathyroid-hormone-crystallography-paper/
http://retractionwatch.wordpress.com/2012/01/26/pnas-retraction-marks-second-for-crystallography-group/


John J. Tanner
Professor of Biochemistry and Chemistry
University of Missouri-Columbia
125 Chemistry Building
Columbia, MO 65211
Phone: 573-884-1280
Fax: 573-882-2754
Email: tanne...@missouri.edumailto:tanne...@missouri.edu
http://faculty.missouri.edu/~tannerjj/tannergroup/tanner.html

On Oct 17, 2013, at 8:51 AM, Lucas 
lucasbleic...@gmail.commailto:lucasbleic...@gmail.com
 wrote:

Dear all,

I've been lecturing in a structural bioinformatics course where graduate 
students (always consisting of people without crystallography background to 
that point) are expected to understand the basics on how x-ray structures are 
obtained, so that they know what they are using in their bioinformatics 
projects. Practices include letting them manually build a segment from an 
excellent map and also using Coot to check problems in not so good structures.

I wonder if there's a list of problematic structures somewhere that I could use 
for that practice? Apart from a few ones I'm aware of because of (bad) 
publicity, what I usually do is an advanced search on PDB for entries with poor 
resolution and bound ligands, then checking then manually, hopefully finding 
some examples of creative map interpretation. But it would be nice to have 
specific examples for each thing that can go wrong in a PDB construction.

Best regards,
Lucas



Re: [ccp4bb] Problematic PDBs

2013-10-17 Thread Roger Rowlett
Yikes! This cuts close to my area. We occasionally have undergrads solve 
and refine carbonic anhydrase-sulfonamide structures as a part of a 
4-hour biochemistry teaching lab. (We have a whole shelf-full of 
sulfonamides that make excellent teaching projects.)

___
Roger S. Rowlett
Gordon  Dorothy Kline Professor
Department of Chemistry
Colgate University
13 Oak Drive
Hamilton, NY 13346

tel: (315)-228-7245
ofc: (315)-228-7395
fax: (315)-228-7935
email: rrowl...@colgate.edu

On 10/17/2013 12:55 PM, Ethan A Merritt wrote:

On Thursday, 17 October, 2013 10:51:08 Lucas wrote:

Dear all,

I've been lecturing in a structural bioinformatics course where graduate
students (always consisting of people without crystallography background to
that point) are expected to understand the basics on how x-ray structures
are obtained, so that they know what they are using in their bioinformatics
projects. Practices include letting them manually build a segment from an
excellent map and also using Coot to check problems in not so good
structures.

I wonder if there's a list of problematic structures somewhere that I could
use for that practice?

4KAP is a nice cautionary example of failing to properly refine a ligand
after placement.

- Open coot, download 4KAP + map from EDS.
- Navigate to ligand and view difference density map.
- Oops.
- Now open up residue information for the ligand.  Notice anything odd?

For bonus points, look up the known ligation chemistry of this site.
Notice that the binding pose of the 4KAP ligand does not match it.

Ethan


Apart from a few ones I'm aware of because of (bad)
publicity, what I usually do is an advanced search on PDB for entries with
poor resolution and bound ligands, then checking then manually, hopefully
finding some examples of creative map interpretation. But it would be nice
to have specific examples for each thing that can go wrong in a PDB
construction.

Best regards,
Lucas


Re: [ccp4bb] Problematic PDBs

2013-10-17 Thread Alessandro Nascimento
Hi Lucas,

this book (
http://www.amazon.com/Structural-Bioinformatics-Methods-Biochemical-Analysis/dp/0471201995/ref=sr_1_2?s=booksie=UTF8qid=1382044405sr=1-2keywords=structural+bioinformatics)
brings nice examples of protein structures with unusual features in the
structure validation chapter . I used it on my protein modelingcourse and
it is definitely worth buying.

I small list taken from the book (unless I am very much mistaken) includes
these structures:

1. 2ABX
2. 1GMA
3. 1CYC
4. 3PGM
5. 1CTX
6. 2GN5
7. 2ATC
8. 1PYP
9. 4RCR
10. 1TRC


HTH,

--asn

[ ]s

--alessandro


2013/10/17 Lucas lucasbleic...@gmail.com

 Dear all,

 I've been lecturing in a structural bioinformatics course where graduate
 students (always consisting of people without crystallography background to
 that point) are expected to understand the basics on how x-ray structures
 are obtained, so that they know what they are using in their bioinformatics
 projects. Practices include letting them manually build a segment from an
 excellent map and also using Coot to check problems in not so good
 structures.

 I wonder if there's a list of problematic structures somewhere that I
 could use for that practice? Apart from a few ones I'm aware of because of
 (bad) publicity, what I usually do is an advanced search on PDB for entries
 with poor resolution and bound ligands, then checking then manually,
 hopefully finding some examples of creative map interpretation. But it
 would be nice to have specific examples for each thing that can go wrong in
 a PDB construction.

 Best regards,
 Lucas