Re: [Jmol-developers] Re: [Fwd: Re: two mmCIF issues]

Bob Hanson Thu, 04 May 2006 10:31:07 -0700

OK, turns out there isn't any problem with this in the prototype CIFreader.

Both 10.2 and 10.x read PDB files identically with this '0' to ' 'change.

Eric, both trouble makers 1bkx and 1d66 and several others readidentically in CIF and PDB.

Some files (1pgb) read differently CIF and PDB intentionally -- RCSBhas redefined or "corrected" the files in terms of what's an atom,hetatm, etc, and what can have a chain designation (not heteroatoms).

Some files (1pgb) read differently CIF and PDB because of the order inwhich TURNS and SHEETS are defined in the file, and it is possiblefor a SINGLE group to be labeled both a TURN and a SHEET.

So, the bottom line I think (sorry Egon) is that the very complex codeadded recently to the CIF reader in order to align it with the PDBreader is unnecessary. All it really does is provied a complex path toatom_site.auth_asym_id. The advise I got, "Just always readatom_site.auth_asym_id," appears correct until proven otherwise.


We can keep testing....

Bob



Bob Hanson wrote:

Eric, Miguel, et al.,

I just uploaded the following fix:

changes '0' to ' ' in CIF reader to make it read consistent with PDB reader
when no chain is indicated. To select atoms without a chain, simply

select *:

(as when the PDB file is read currently)

Trunk is ready for (monthly? perhaps?) integration into 10.2.
Right now the cif readers in the 10.x prototype and 10.2 are quitedifferent in how they approach the reading of mmCIF files. This isbecause Egon and I have received two different messages from the RCSBgroup, and because I haven't fixed 10.x (x for "experimental", by theway) based on what I received. That's interesting, I think, and worthexploring. So I plan on introduce some fixes into the prototype only(bob200603) that will match what I was told. Then we can compare and seewhat works.
Bob


Eric Martz wrote:
Dear Bob,
These issues are complicated and I'm afraid I don't have time now tothink carefully about them. I'm sure I'll get back to mmCIF sometimelater and then you'll hear from me.
When I was looking at Jmol's reading of mmCIF files, it appeared to methat everything that didn't have a chain was assigned to chain 0,including water and ligands. This does not seem useful and indeedwould lead to lots of problems, I think.
When a file has a single chain that is not named by the authors, Ithink it would be OK to assign that chain the name "0". That might beuseful. But only residues (amino acids or nucleotides) that arecovalently bonded into the chain should be members.
I appreciate the work you are doing on Jmol very much!
-Eric

At 4/19/06, you wrote:
Eric, I'm forwarding this snippet to you; pardon me if it'sunreadable. Basically Jmol is doing the correct analysis. 1bkx has a"correction" to the PDB in the CIF that I think you are going to seegenerally. Primarily, here are some of the issues with those filesyou mention:
1bkx

HETATM records can't have chain IDs
TPO group now classified as ATOM, not HETATM
A   group now classified as HETATM, not ATOM

1d66

no issues

1pgb

HETATM records can't have chain IDs

Three questions:

1. That sound about right?
2. Are you OK with the idea that the CIF files are not going to readthe same as the PDB files?
3. How do you feel about this assigning of "0" to chain IDs that wereblank in the PDB files? (I can't figure out why that is being done.I'll ask Egon.)
-------- Original Message --------
Subject: Re: two mmCIF issues
Date: Tue, 18 Apr 2006 23:30:32 -0400
From: Zukang Feng <[EMAIL PROTECTED]>
To: info <[EMAIL PROTECTED]>
CC: Bob Hanson <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]><[EMAIL PROTECTED]> <[EMAIL PROTECTED]><[EMAIL PROTECTED]>
Hi Bob,

See my comments below.

Rachel Kramer Green wrote:
Sure.  You can reach Zukang Feng at [EMAIL PROTECTED]

Rachel

Bob Hanson wrote:
Thank you, Rachel, for the reply. Is it OK to contact this persondirectly?
I have some comments below:


Rachel Kramer Green wrote:

> Dear Dr. Hanson,
>
> Thank you for your email message.
>
> I forwarded your questions to our resident mmCIF expert and hiscomments are entered below.
>
> Please write again with any additional questions you have.
>
> Sincerely,
> Rachel Green
>
>***************************
>Rachel Kramer Green, Ph.D.
>RCSB PDB
>
>[EMAIL PROTECTED]
>***************************
>
>
>
> Bob Hanson wrote:
>
>
>> 1d66 cif and pdb
>>
>> First, is there a mistake in the way the cif generator iscreating HETATM records?
>>
>> In the PDB:
>>
>> ATOM 1710 CD2 LEU B 64 0.597 41.712 31.083 1.0030.29 1D661797
>> ...
>> HETATM 1715 CD CD 42 33.200 64.497 45.835 1.0039.60 1D661802>> HETATM 1716 O HOH 301 40.594 60.277 53.968 1.0016.15 1D661803
>>
>>
>> and I see in the CIF:
>>
>> ...
>> ATOM 1707 C CD2 . LEU D 3 64 ? 0.597 41.712 31.083 1.0030.29 ? ? ? ? ?
>> 64  LEU B CD2 1
>> ...
>> HETATM 1711 CD CD . CD H 4 . ? 33.200 64.497 45.835 1.0039.60 ? ? ? ? ?
>> 42  CD  ? CD  1
>> HETATM 1712 O O . HOH I 5 . ? 40.594 60.277 53.968 1.0016.15 ? ? ? ? ?
>> 301 HOH ? O   1
>>
>>
>
> Always use _atom_site.auth_seq_id, _atom_site.auth_comp_id,_atom_site.auth_asym_id, _atom_site.auth_atom_id for PDB nomenclature.
>

OK, that's very helpful. Always use auth*....but...

>>
>>
>> :1
  cif and pdb
>>
>> Here in the pdb we have:
ATOM 2801 OXT PHE A 350 16.196 56.895 6.121 1.0079.11 OTER 2802 PHE A350 ATOM 2803
O5*   A B 351       8.480  41.650  18.848  1.00 32.82           O
>> and in the cif:
>>
>>
>> ATOM 2801 O 'O''' . PHE A 1 350 ? 16.196 56.895 6.121 1.0079.11 ? ? ? ?
>> ? 350 PHE A OXT 1
>> HETATM 2802 O 'O5'' . A B 2 . ? 8.480 41.650 18.848 1.0032.82 ? ? ? ?
>> ? 351 A   ? O5* 1
>>
>> First, why the change to HETATM? Second, why is that last ?there in the cif file? The field defs are:
>
>
> Because nucleotide 'A' treated as het group, not the monomer inpolynucleotide chain. The last '?' should be'_atom_site.auth_asym_id' which means PDB chain ID.
Right, but that is "B" in the PDB files. So is this an error? If weuse the auth_* fields as recommended above,
then we would have "?" not "B". That's what is confusing me.
I see the problem. The original PDB file assigned chain ID 'B' and
SEQRES record to single nucleotide 'A'. That was a mistake in the PDB
file. When we converted PDB file into mmCIF files, we fixed such
problems. There should be no chain ID and SEQRES record fornucleotide 'A'.
>> Note that we have:
>>
>> #
>> loop_
>> _struct_asym.id
>> _struct_asym.pdbx_blank_PDB_chainid_flag
>> _struct_asym.pdbx_modified
>> _struct_asym.entity_id
>> _struct_asym.details
>> A N N 1 ?
>> B N N 2 ?
>> C N N 3 ?
>> #
>>
>> I would have read that "no changes from author definitions." Sowhat is this block really telling me?
>
>
> In this case, it tells you there are three asym ids 'A', 'B', 'C'and it did not mean anything. The most important item in thiscategory is '_struct_asym.pdbx_blank_PDB_chainid_flag'. If itsvalue is 'Y', it means in the original PDB file, there is no PDBchain ID for single chain, but we add chain ID (usually 'A') inmmCIF file.
When you say "did not mean anything", what do you mean? I thoughtit meant "not changed".
You can say that. But the flag only indicates for polymers, not for het
groups.
Why wouldn't _atom_site.auth_asym_id be "B" in this case in theHETATM record?
I explained above.
When the "N" is here, should I use _atom_site.label_asym_id insteadof _atom_site.auth_asym_id ?
You should always use _atom_site.auth_asym_id for PDB chain IDs.
>
>>
>> Along the same lines, we have 1pgb:
>>
>> 1pgb PDB:
>>
>> ATOM 436 OXT GLU 56 6.410 6.617 4.667 1.0024.74 1PGB 505
>> TER     437      GLU
56                                              1PGB 506
>> HETATM  438  O   HOH    57      12.132   8.422  11.247  1.00
8.87      1PGB 507
>>
>> 1pgb CIF:
>>
>> and
>>
>> ATOM 436 O 'O''' . GLU A 1 56 ? 6.410 6.617 4.667 1.0024.74 ? ? ? ? ? 56
>> GLU A OXT 1
>> HETATM 437 O O     . HOH B 2 .  ? 12.132 8.422  11.247 1.00 8.87
? ? ? ? ? 57
>> HOH ? O   1
>>
>>
>> #
>> loop_
>> _struct_asym.id
>> _struct_asym.pdbx_blank_PDB_chainid_flag
>> _struct_asym.pdbx_modified
>> _struct_asym.entity_id
>> _struct_asym.details
>> A Y N 1 ?
>> B N N 2 ?
>> #
>>
>> So I see why there is a Y for A -- that was blank in the PDB file.
>> But why are the Ns there for B? The atom site field has beenchanged to B.
>
>
> We only added chain ID for polymer.


>
>>
>> Another issue, this time with both CIF and PDB:
>>
>> SHEET    1 S1  4 LEU    12  ALA    20
0                                1PGB  57
>> SHEET 2 S1 4 MET 1 GLY 9-1 1PGB 58
>> SHEET    3 S1  4 LYS    50  GLU    56
1                                1PGB  59
>> SHEET 4 S1 4 GLU 42 ASP 46-1 1PGB 60>> TURN 1 T1 GLY 9 LEU 12 H-BOND ABSENT9-12 1PGB 61>> TURN 2 T2 ASP 47 LYS 50 LYS 50 IN L-HELIXCONFORMATION 1PGB 62
>>
>>
>> Note the duplication of 9 and of 50. How can a residue be inboth a sheet and a turn? Is that a mistake? Is it common? Thisbecomes an issue for us because we have to display one or theother; we don't have a way of a group being both in a turn and in abeta-pleated sheet. But this leads to differences in structure forPDB and mmCIF, just because of the order with which these aredefined -- SHEET,TURN in this PDB; TURN,SHEET in mmCIF. What do yourecommend?
>
>
> I think it has to do with which program you used to calculatesecondary structures. I think it's possible for terminal residuesin both secondary structures units although it's not very common.
>
>>
>> Finally, if you can help me out on understanding theinterrelated roles of asym_id, label_entity_id, auth_asym_id, andsimilar for seq_id, I would really appreciate it.
>
>
> _atom_site.label_atom_id, _atom_site.label_comp_id,_atom_site.label_asym_id, _atom_site.label_seq_id,_atom_site.label_entity_id are items used to define cifnomenclature. For atom name (_atom_site.label_atom_id), we useIUPAC nomenclature. The residue name (_atom_site.label_comp_id)should be same as PDB nomenclature. The sequence number (alwaysstart 1) and asym ID (always start 'A') are automatically generateby program.
>
> _atom_site.auth_atom_id, _atom_site.auth_comp_id,_atom_site.auth_asym_id, _atom_site.auth_seq_id are used to definePDB nomenclature. In most cases, it should match with PDB files.
OK, but we have two cases above where it does not. My question thenis whether this is a mmCIF error or intentional. What exactly doesit mean when these fields do not match the PDB file?
In these cases, they are intentional. Currently we are working on a
project to cleanup all entries in PDB archive. The goal of this project
is to fix as much errors as we can and make files more consistent across
whole archive. We hope we'll have a new set of mmCIF files (less error,
but not error free) by the end of year. There will be lots of
differences between mmCIF and PDB files. It could have some mistakes
during conversion. Overall mmCIF files definitely will be better than
original PDB files.
>
>
>>
>> Bob Hanson
>> Jmol Development Team
>> [EMAIL PROTECTED]
>>
>>
>
>
>
--

Robert M. Hanson, [EMAIL PROTECTED], 507-646-3107
Professor of Chemistry, St. Olaf College
1520 St. Olaf Ave., Northfield, MN 55057
mailto:[EMAIL PROTECTED]
http://www.stolaf.edu/people/hansonr

"Imagination is more important than knowledge."  - Albert Einstein
/* - - - - - - - - - - - - - - - - - - - - - - - - - - -
Eric Martz, Professor Emeritus, Dept Microbiology
U Mass, Amherst -- http://www.umass.edu/molvis/martz

Biochem 3D Education Resources http://MolviZ.org
See 3D Molecules, Install Nothing! - http://firstglance.jmol.org
Protein Explorer - 3D Visualization: http://proteinexplorer.org
Workshops: http://workshops.proteinexplorer.org
World Index of Molecular Visualization Resources: http://molvisindex.org
ConSurf - Find Conserved Patches in Proteins: http://consurf.tau.ac.il
Atlas of Macromolecules: http://atlas.proteinexplorer.org
PDB Lite Macromolecule Finder: http://pdblite.org
Molecular Visualization EMail List (molvis-list):
      http://bioinformatics.org/mailman/listinfo/molvis-list
- - - - - - - - - - - - - - - - - - - - - - - - - - - */
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your jobeasier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jmol-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jmol-developers


--

Robert M. Hanson, [EMAIL PROTECTED], 507-646-3107
Professor of Chemistry, St. Olaf College
1520 St. Olaf Ave., Northfield, MN 55057
mailto:[EMAIL PROTECTED]
http://www.stolaf.edu/people/hansonr

"Imagination is more important than knowledge."  - Albert Einstein


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jmol-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jmol-developers

Re: [Jmol-developers] Re: [Fwd: Re: two mmCIF issues]

Reply via email to