Re: [ccp4bb] Problematic PDBs
Hmm So the backstory for the problematic ligand R12 in this thread is that a sharp-eyed worker at the wwPDB recently spotted that there was an error compared with the original 1999 paper. Correcting the R12 ligand is a friendly gesture from the PDB as it appears that the error must have been the authors' - the atom correcting the R12 ligand has been inserted by the PDB staff rather than retreived from a deposited structure. It is a shame the same helpful approach is not always applied. One current example I spotted is a separate 'problematic ligand' 5AX which has been added by the wwPDB to at least four other authors' entries, starting in 2006 with the latest in 2009. 5AX is basically a fragment ligand which the PDB software produces if a NAG has wandered too far from its Asn sidechain during refinement. If 5AX is generated during the PDB processing of a deposition, then it really should be highlighted for the authors as a geometric issue - rather than, as in these cases, being simply added to the coordinates. Reading the authors' papers for the 5AX-containing entries makes it clear that they never expected anything other than NAG to appear in their deposited coordinates. And given its artifactual production during deposition, 5AX should never have 'escaped into the wild'. So if a retrospective fix can be applied to R12 (which similar in lacking an atom) then it seems to me that, in fairness, a clean up of the 5AX entries should be arranged. Yours (not holding his breath), Martyn From: Rachel Kramer Green kra...@rcsb.rutgers.edu To: CCP4BB@JISCMAIL.AC.UK Sent: Wednesday, 6 November 2013, 16:49 Subject: Re: [ccp4bb] Problematic PDBs Dear Martyn, wwPDB staff regularly reviews and remediates PDB data and related dictionaries such as the Chemical Component Dictionary (CCD). As part of our on-going remediation efforts, the chemical components in the archive are regularly reviewed to ensure the correctness and the completeness of the chemical representation. Such reviews show that in some cases, the author has failed to provide a complete description of the chemistry. To address any such errors, the definitions are corrected. The chemical name and formula are changed in the PDB file, but the coordinates are not changed. In the case of entry 3CBS, issues were found with the chemical component definition for its ligand R12. The methyl group was not in the deposited coordinates and it was missing from the original definition. In addition, the bond order in one of the carbon-carbon bonds was incorrectly defined. The CCD definition for R12 was updated in 2011 to add the methyl group and to correct the bond order based on information in the primary citation. The coordinates for this PDB entry were not changed. Therefore, in accordance with wwPDB policy, the file was not obsoleted. Sincerely, Rachel Green Rachel Kramer Green, Ph.D. RCSB PDB kra...@rcsb.rutgers.edu Twitter: https://twitter.com/#!/buildmodels Facebook: http://www.facebook.com/RCSBPDB On 10/21/2013 6:28 AM, MARTYN SYMMONS wrote: As a postscript it might be worth mentioning one problematic ligand that suggested to me a way to correct some of the errors mentioned in this thread R12 is indicated as 9-(4-HYDROXY-2,6-DIMETHYL-PHENYL)-3 in the most recent Coot monomer library. But in the PDB ligand description it is 9-(4-hydroxy-2,3,6-trimethylphenyl)-3,7-dimethylnona-2,4,6,8-tetraenoic acid with an additional carbon C16. To make a long story short this ligand was originally deposited missing this extra methyl goup in 1999 (as part of 3CBS) and then apparently updated in 2011 by the PDB. (the relevant lines in the cif are snip R12 C16 C16 C 0 1 N N N ? ? ? -6.631 1.502 0.990 C16 R12 44 R12 H1 H1 H 0 1 N N N ? ? ? -6.602 1.511 2.080 H1 R12 45 R12 H23 H23 H 0 1 N N N ? ? ? -6.422 2.503 0.613 H23 R12 46 R12 H24 H24 H 0 1 N N N ? ? ? -7.619 1.186 0.656 H24 R12 47 snip with the ? ? ? indicating that refined coordinates were not available at the time of the update. There was initially an explanation line at the end of the cif: snip R12 Other modification 2011-10-25 RCSB CS 'add missing methyl group, re-define bond order based on publication' snip But this has mutated for some reason (premature stop codon?) over the past year to the following. snip R12 Other modification 2011-10-25 RCSB snip Obviously the full correct ligand could not have been incorporated into the PDB entry coordinates without these undergoing a full obsolete - supersede process (somewhat embarrassing perhaps as one author is now a wwPDB PI ;) But it is frustrating for users of the PDB that in such cases easily correctable
Re: [ccp4bb] Problematic PDBs
As a postscript it might be worth mentioning one problematic ligand that suggested to me a way to correct some of the errors mentioned in this thread R12 is indicated as 9-(4-HYDROXY-2,6-DIMETHYL-PHENYL)-3 in the most recent Coot monomer library. But in the PDB ligand description it is 9-(4-hydroxy-2,3,6-trimethylphenyl)-3,7-dimethylnona-2,4,6,8-tetraenoic acid with an additional carbon C16. To make a long story short this ligand was originally deposited missing this extra methyl goup in 1999 (as part of 3CBS) and then apparently updated in 2011 by the PDB. (the relevant lines in the cif are snip R12 C16 C16 C 0 1 N N N ? ? ? -6.631 1.502 0.990 C16 R12 44 R12 H1 H1 H 0 1 N N N ? ? ? -6.602 1.511 2.080 H1 R12 45 R12 H23 H23 H 0 1 N N N ? ? ? -6.422 2.503 0.613 H23 R12 46 R12 H24 H24 H 0 1 N N N ? ? ? -7.619 1.186 0.656 H24 R12 47 snip with the ? ? ? indicating that refined coordinates were not available at the time of the update. There was initially an explanation line at the end of the cif: snip R12 Other modification 2011-10-25 RCSB CS 'add missing methyl group, re-define bond order based on publication' snip But this has mutated for some reason (premature stop codon?) over the past year to the following. snip R12 Other modification 2011-10-25 RCSB snip Obviously the full correct ligand could not have been incorporated into the PDB entry coordinates without these undergoing a full obsolete - supersede process (somewhat embarrassing perhaps as one author is now a wwPDB PI ;) But it is frustrating for users of the PDB that in such cases easily correctable errors are not actually updated by the authors. Would it not be helpful if there were a mechanism to make and track useful improvements in deposited structures? - Perhaps suggested by members of the community to the authors. These changes could be considered as 'corrigenda' and could be documented and tracked - complete with an explanation of the reasoning behind the change and attributing the motivation and origin of the improvement. This would be a good way for the wider scientific community (who maybe do not read this bulletin board) to access the best current model without the authors suffering the full process of retracting and redepositing their PDB entry. The test for obsoleting would then be the same as for a paper - that the change invalidates a fundamental interpretation of the data. All the best Martyn From: Pavel Afonine pafon...@gmail.com To: CCP4BB@JISCMAIL.AC.UK Sent: Sunday, 20 October 2013, 19:49 Subject: Re: [ccp4bb] Problematic PDBs Hello, just for the sake of completeness: this paper lists a bunch of known pathologies (I would not be surprised if they've been remediated by now): http://www.phenix-online.org/papers/he5476_reprint.pdf Pavel On Thu, Oct 17, 2013 at 6:51 AM, Lucas lucasbleic...@gmail.com wrote: Dear all, I've been lecturing in a structural bioinformatics course where graduate students (always consisting of people without crystallography background to that point) are expected to understand the basics on how x-ray structures are obtained, so that they know what they are using in their bioinformatics projects. Practices include letting them manually build a segment from an excellent map and also using Coot to check problems in not so good structures. I wonder if there's a list of problematic structures somewhere that I could use for that practice? Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. Best regards, Lucas
Re: [ccp4bb] Problematic PDBs
Hello, just for the sake of completeness: this paper lists a bunch of known pathologies (I would not be surprised if they've been remediated by now): http://www.phenix-online.org/papers/he5476_reprint.pdf Pavel On Thu, Oct 17, 2013 at 6:51 AM, Lucas lucasbleic...@gmail.com wrote: Dear all, I've been lecturing in a structural bioinformatics course where graduate students (always consisting of people without crystallography background to that point) are expected to understand the basics on how x-ray structures are obtained, so that they know what they are using in their bioinformatics projects. Practices include letting them manually build a segment from an excellent map and also using Coot to check problems in not so good structures. I wonder if there's a list of problematic structures somewhere that I could use for that practice? Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. Best regards, Lucas
Re: [ccp4bb] Problematic PDBs
.and there is always the twilight collection and the gems shown in the associated paper: http://www.ruppweb.org/twilight/default.htm Best, BR From: CCP4 bulletin board [mailto:CCP4BB@JISCMAIL.AC.UK] On Behalf Of Alessandro Nascimento Sent: Donnerstag, 17. Oktober 2013 23:22 To: CCP4BB@JISCMAIL.AC.UK Subject: Re: [ccp4bb] Problematic PDBs Hi Lucas, this book (http://www.amazon.com/Structural-Bioinformatics-Methods-Biochemical-Analysi s/dp/0471201995/ref=sr_1_2?s=books http://www.amazon.com/Structural-Bioinformatics-Methods-Biochemical-Analysi s/dp/0471201995/ref=sr_1_2?s=booksie=UTF8qid=1382044405sr=1-2keywords=st ructural+bioinformatics ie=UTF8qid=1382044405sr=1-2keywords=structural+bioinformatics) brings nice examples of protein structures with unusual features in the structure validation chapter . I used it on my protein modelingcourse and it is definitely worth buying. I small list taken from the book (unless I am very much mistaken) includes these structures: 1. 2ABX 2. 1GMA 3. 1CYC 4. 3PGM 5. 1CTX 6. 2GN5 7. 2ATC 8. 1PYP 9. 4RCR 10. 1TRC HTH, --asn [ ]s --alessandro 2013/10/17 Lucas lucasbleic...@gmail.com Dear all, I've been lecturing in a structural bioinformatics course where graduate students (always consisting of people without crystallography background to that point) are expected to understand the basics on how x-ray structures are obtained, so that they know what they are using in their bioinformatics projects. Practices include letting them manually build a segment from an excellent map and also using Coot to check problems in not so good structures. I wonder if there's a list of problematic structures somewhere that I could use for that practice? Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. Best regards, Lucas
Re: [ccp4bb] Problematic PDBs
On Thu, Oct 17, 2013 at 6:51 AM, Lucas lucasbleic...@gmail.com wrote: I wonder if there's a list of problematic structures somewhere that I could use for that practice? Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. This would be a good place to start: http://www.ncbi.nlm.nih.gov/pubmed/23385452 The retracted ABC transporter structures are also good, although less obvious to the untrained eye. I forget what the PDB IDs are but I'll see if I can dig them up. -Nat
Re: [ccp4bb] Problematic PDBs
From the original ABC transporter retraction: http://www.sciencemag.org/content/314/5807/1875.2.full The Protein Data Bank (PDB) files 1JSQ, 1PF4, and 1Z2R for MsbA and 1S7B and 2F2M for EmrE have been moved to the archive of obsolete PDB entries You can get your hands on them via URLs like: ftp://ftp.rcsb.org/pub/pdb/data/structures/obsolete/XML/js/1jsq.xml.gz Phil Jeffrey Princeton On 10/17/13 10:26 AM, Nat Echols wrote: On Thu, Oct 17, 2013 at 6:51 AM, Lucas lucasbleic...@gmail.com mailto:lucasbleic...@gmail.com wrote: I wonder if there's a list of problematic structures somewhere that I could use for that practice? Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. This would be a good place to start: http://www.ncbi.nlm.nih.gov/pubmed/23385452 The retracted ABC transporter structures are also good, although less obvious to the untrained eye. I forget what the PDB IDs are but I'll see if I can dig them up. -Nat
Re: [ccp4bb] Problematic PDBs
On Thursday, 17 October, 2013 10:51:08 Lucas wrote: Dear all, I've been lecturing in a structural bioinformatics course where graduate students (always consisting of people without crystallography background to that point) are expected to understand the basics on how x-ray structures are obtained, so that they know what they are using in their bioinformatics projects. Practices include letting them manually build a segment from an excellent map and also using Coot to check problems in not so good structures. I wonder if there's a list of problematic structures somewhere that I could use for that practice? 4KAP is a nice cautionary example of failing to properly refine a ligand after placement. - Open coot, download 4KAP + map from EDS. - Navigate to ligand and view difference density map. - Oops. - Now open up residue information for the ligand. Notice anything odd? For bonus points, look up the known ligation chemistry of this site. Notice that the binding pose of the 4KAP ligand does not match it. Ethan Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. Best regards, Lucas
Re: [ccp4bb] Problematic PDBs
I would start with 1E4M (residue 361 of chain M) and 1QW9 (170 of chain B). First show the model and then reveal the electron density. This promotes a healthy skepticism of PDB models and enforces the importance of always looking at a model in the context of the map. For model building I would recommend 2PWJ and 3SQK. In 3SQK the linker to the His tag in chain B was built using the wrong sequence. It is fairly easy to build a sequence into the density and then recognize what the linker actually is. In 2PWJ the wrong sequence was used up to residue 31. I've never been able to figure out how this error came to be. Some horrible, horrible mistake was made when sequencing the gene and the person who built the model believed the sequence more than the density. The model building required to correct 2PWJ is more challenging since a number of short cuts were made cutting out loops. If I recall, my model has about 10 more amino acids than the PDB model. In all of these cases the majority of the resides in each model are fine. 3SQK has been replaced with a corrected model (4F4J). Dale Tronrud On 10/17/2013 06:51 AM, Lucas wrote: Dear all, I've been lecturing in a structural bioinformatics course where graduate students (always consisting of people without crystallography background to that point) are expected to understand the basics on how x-ray structures are obtained, so that they know what they are using in their bioinformatics projects. Practices include letting them manually build a segment from an excellent map and also using Coot to check problems in not so good structures. I wonder if there's a list of problematic structures somewhere that I could use for that practice? Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. Best regards, Lucas
Re: [ccp4bb] Problematic PDBs
I use 2QNS for teaching. It is an egregious case of modeling ligand into noise. Also, the structure has many close contacts (e.g. HOH A351), poor stereochemistry (e.g. A58-A61), and incorrectly built water. Turn on symmetry to see the steric clash of the peptide ligand with itself. You can get the coordinates and maps from EDS. http://www.ncbi.nlm.nih.gov/pubmed/18611381 http://www.ncbi.nlm.nih.gov/pubmed/21827955 http://retractionwatch.wordpress.com/2011/08/16/ties-that-dont-bind-group-retracts-parathyroid-hormone-crystallography-paper/ http://retractionwatch.wordpress.com/2012/01/26/pnas-retraction-marks-second-for-crystallography-group/ John J. Tanner Professor of Biochemistry and Chemistry University of Missouri-Columbia 125 Chemistry Building Columbia, MO 65211 Phone: 573-884-1280 Fax: 573-882-2754 Email: tanne...@missouri.edumailto:tanne...@missouri.edu http://faculty.missouri.edu/~tannerjj/tannergroup/tanner.html On Oct 17, 2013, at 8:51 AM, Lucas lucasbleic...@gmail.commailto:lucasbleic...@gmail.com wrote: Dear all, I've been lecturing in a structural bioinformatics course where graduate students (always consisting of people without crystallography background to that point) are expected to understand the basics on how x-ray structures are obtained, so that they know what they are using in their bioinformatics projects. Practices include letting them manually build a segment from an excellent map and also using Coot to check problems in not so good structures. I wonder if there's a list of problematic structures somewhere that I could use for that practice? Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. Best regards, Lucas
Re: [ccp4bb] Problematic PDBs
Yikes! This cuts close to my area. We occasionally have undergrads solve and refine carbonic anhydrase-sulfonamide structures as a part of a 4-hour biochemistry teaching lab. (We have a whole shelf-full of sulfonamides that make excellent teaching projects.) ___ Roger S. Rowlett Gordon Dorothy Kline Professor Department of Chemistry Colgate University 13 Oak Drive Hamilton, NY 13346 tel: (315)-228-7245 ofc: (315)-228-7395 fax: (315)-228-7935 email: rrowl...@colgate.edu On 10/17/2013 12:55 PM, Ethan A Merritt wrote: On Thursday, 17 October, 2013 10:51:08 Lucas wrote: Dear all, I've been lecturing in a structural bioinformatics course where graduate students (always consisting of people without crystallography background to that point) are expected to understand the basics on how x-ray structures are obtained, so that they know what they are using in their bioinformatics projects. Practices include letting them manually build a segment from an excellent map and also using Coot to check problems in not so good structures. I wonder if there's a list of problematic structures somewhere that I could use for that practice? 4KAP is a nice cautionary example of failing to properly refine a ligand after placement. - Open coot, download 4KAP + map from EDS. - Navigate to ligand and view difference density map. - Oops. - Now open up residue information for the ligand. Notice anything odd? For bonus points, look up the known ligation chemistry of this site. Notice that the binding pose of the 4KAP ligand does not match it. Ethan Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. Best regards, Lucas
Re: [ccp4bb] Problematic PDBs
Hi Lucas, this book ( http://www.amazon.com/Structural-Bioinformatics-Methods-Biochemical-Analysis/dp/0471201995/ref=sr_1_2?s=booksie=UTF8qid=1382044405sr=1-2keywords=structural+bioinformatics) brings nice examples of protein structures with unusual features in the structure validation chapter . I used it on my protein modelingcourse and it is definitely worth buying. I small list taken from the book (unless I am very much mistaken) includes these structures: 1. 2ABX 2. 1GMA 3. 1CYC 4. 3PGM 5. 1CTX 6. 2GN5 7. 2ATC 8. 1PYP 9. 4RCR 10. 1TRC HTH, --asn [ ]s --alessandro 2013/10/17 Lucas lucasbleic...@gmail.com Dear all, I've been lecturing in a structural bioinformatics course where graduate students (always consisting of people without crystallography background to that point) are expected to understand the basics on how x-ray structures are obtained, so that they know what they are using in their bioinformatics projects. Practices include letting them manually build a segment from an excellent map and also using Coot to check problems in not so good structures. I wonder if there's a list of problematic structures somewhere that I could use for that practice? Apart from a few ones I'm aware of because of (bad) publicity, what I usually do is an advanced search on PDB for entries with poor resolution and bound ligands, then checking then manually, hopefully finding some examples of creative map interpretation. But it would be nice to have specific examples for each thing that can go wrong in a PDB construction. Best regards, Lucas