OK, turns out there isn't any problem with this in the prototype CIF reader.

Both 10.2 and 10.x read PDB files identically with this '0' to ' ' change.

Eric, both trouble makers 1bkx and 1d66 and several others read identically in CIF and PDB.

Some files (1pgb) read differently CIF and PDB intentionally -- RCSB has redefined or "corrected" the files in terms of what's an atom, hetatm, etc, and what can have a chain designation (not heteroatoms).

Some files (1pgb) read differently CIF and PDB because of the order in which TURNS and SHEETS are defined in the file, and it is possible for a SINGLE group to be labeled both a TURN and a SHEET.

So, the bottom line I think (sorry Egon) is that the very complex code added recently to the CIF reader in order to align it with the PDB reader is unnecessary. All it really does is provied a complex path to atom_site.auth_asym_id. The advise I got, "Just always read atom_site.auth_asym_id," appears correct until proven otherwise.

We can keep testing....

Bob



Bob Hanson wrote:
Eric, Miguel, et al.,

I just uploaded the following fix:

changes '0' to ' ' in CIF reader to make it read consistent with PDB reader
when no chain is indicated. To select atoms without a chain, simply

select *:

(as when the PDB file is read currently)

Trunk is ready for (monthly? perhaps?) integration into 10.2.

Right now the cif readers in the 10.x prototype and 10.2 are quite different in how they approach the reading of mmCIF files. This is because Egon and I have received two different messages from the RCSB group, and because I haven't fixed 10.x (x for "experimental", by the way) based on what I received. That's interesting, I think, and worth exploring. So I plan on introduce some fixes into the prototype only (bob200603) that will match what I was told. Then we can compare and see what works.

Bob


Eric Martz wrote:

Dear Bob,

These issues are complicated and I'm afraid I don't have time now to think carefully about them. I'm sure I'll get back to mmCIF sometime later and then you'll hear from me.

When I was looking at Jmol's reading of mmCIF files, it appeared to me that everything that didn't have a chain was assigned to chain 0, including water and ligands. This does not seem useful and indeed would lead to lots of problems, I think.

When a file has a single chain that is not named by the authors, I think it would be OK to assign that chain the name "0". That might be useful. But only residues (amino acids or nucleotides) that are covalently bonded into the chain should be members.

I appreciate the work you are doing on Jmol very much!
-Eric

At 4/19/06, you wrote:

Eric, I'm forwarding this snippet to you; pardon me if it's unreadable. Basically Jmol is doing the correct analysis. 1bkx has a "correction" to the PDB in the CIF that I think you are going to see generally. Primarily, here are some of the issues with those files you mention:

1bkx

HETATM records can't have chain IDs
TPO group now classified as ATOM, not HETATM
A   group now classified as HETATM, not ATOM

1d66

no issues

1pgb

HETATM records can't have chain IDs

Three questions:

1. That sound about right?

2. Are you OK with the idea that the CIF files are not going to read the same as the PDB files?

3. How do you feel about this assigning of "0" to chain IDs that were blank in the PDB files? (I can't figure out why that is being done. I'll ask Egon.)



-------- Original Message --------
Subject: Re: two mmCIF issues
Date: Tue, 18 Apr 2006 23:30:32 -0400
From: Zukang Feng <[EMAIL PROTECTED]>
To: info <[EMAIL PROTECTED]>
CC: Bob Hanson <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>

Hi Bob,

See my comments below.

Rachel Kramer Green wrote:

Sure.  You can reach Zukang Feng at [EMAIL PROTECTED]

Rachel

Bob Hanson wrote:

Thank you, Rachel, for the reply. Is it OK to contact this person directly?
I have some comments below:


Rachel Kramer Green wrote:

> Dear Dr. Hanson,
>
> Thank you for your email message.
>
> I forwarded your questions to our resident mmCIF expert and his comments are entered below.
>
> Please write again with any additional questions you have.
>
> Sincerely,
> Rachel Green
>
>***************************
>Rachel Kramer Green, Ph.D.
>RCSB PDB
>
>[EMAIL PROTECTED]
>***************************
>
>
>
> Bob Hanson wrote:
>
>
>> 1d66 cif and pdb
>>
>> First, is there a mistake in the way the cif generator is creating HETATM records?
>>
>> In the PDB:
>>
>> ATOM 1710 CD2 LEU B 64 0.597 41.712 31.083 1.00 30.29 1D661797
>> ...
>> HETATM 1715 CD CD 42 33.200 64.497 45.835 1.00 39.60 1D661802 >> HETATM 1716 O HOH 301 40.594 60.277 53.968 1.00 16.15 1D661803
>>
>>
>> and I see in the CIF:
>>
>> ...
>> ATOM 1707 C CD2 . LEU D 3 64 ? 0.597 41.712 31.083 1.00 30.29 ? ? ? ? ?
>> 64  LEU B CD2 1
>> ...
>> HETATM 1711 CD CD . CD H 4 . ? 33.200 64.497 45.835 1.00 39.60 ? ? ? ? ?
>> 42  CD  ? CD  1
>> HETATM 1712 O O . HOH I 5 . ? 40.594 60.277 53.968 1.00 16.15 ? ? ? ? ?
>> 301 HOH ? O   1
>>
>>
>
> Always use _atom_site.auth_seq_id, _atom_site.auth_comp_id, _atom_site.auth_asym_id, _atom_site.auth_atom_id for PDB nomenclature.
>

OK, that's very helpful. Always use auth*....but...

>>
>>
>> :1



  cif and pdb
>>
>> Here in the pdb we have:

ATOM 2801 OXT PHE A 350 16.196 56.895 6.121 1.00 79.11 O TER 2802 PHE A 350 ATOM 2803
O5*   A B 351       8.480  41.650  18.848  1.00 32.82           O
>> and in the cif:
>>
>>
>> ATOM 2801 O 'O''' . PHE A 1 350 ? 16.196 56.895 6.121 1.00 79.11 ? ? ? ?
>> ? 350 PHE A OXT 1
>> HETATM 2802 O 'O5'' . A B 2 . ? 8.480 41.650 18.848 1.00 32.82 ? ? ? ?
>> ? 351 A   ? O5* 1
>>
>> First, why the change to HETATM? Second, why is that last ? there in the cif file? The field defs are:
>
>
> Because nucleotide 'A' treated as het group, not the monomer in polynucleotide chain. The last '?' should be '_atom_site.auth_asym_id' which means PDB chain ID.

Right, but that is "B" in the PDB files. So is this an error? If we use the auth_* fields as recommended above,
then we would have "?" not "B". That's what is confusing me.



I see the problem. The original PDB file assigned chain ID 'B' and
SEQRES record to single nucleotide 'A'. That was a mistake in the PDB
file. When we converted PDB file into mmCIF files, we fixed such
problems. There should be no chain ID and SEQRES record for nucleotide 'A'.



>> Note that we have:
>>
>> #
>> loop_
>> _struct_asym.id
>> _struct_asym.pdbx_blank_PDB_chainid_flag
>> _struct_asym.pdbx_modified
>> _struct_asym.entity_id
>> _struct_asym.details
>> A N N 1 ?
>> B N N 2 ?
>> C N N 3 ?
>> #
>>
>> I would have read that "no changes from author definitions." So what is this block really telling me?
>
>
> In this case, it tells you there are three asym ids 'A', 'B', 'C' and it did not mean anything. The most important item in this category is '_struct_asym.pdbx_blank_PDB_chainid_flag'. If its value is 'Y', it means in the original PDB file, there is no PDB chain ID for single chain, but we add chain ID (usually 'A') in mmCIF file.

When you say "did not mean anything", what do you mean? I thought it meant "not changed".


You can say that. But the flag only indicates for polymers, not for het
groups.



Why wouldn't _atom_site.auth_asym_id be "B" in this case in the HETATM record?


I explained above.



When the "N" is here, should I use _atom_site.label_asym_id instead of _atom_site.auth_asym_id ?


You should always use _atom_site.auth_asym_id for PDB chain IDs.



>
>>
>> Along the same lines, we have 1pgb:
>>
>> 1pgb PDB:
>>
>> ATOM 436 OXT GLU 56 6.410 6.617 4.667 1.00 24.74 1PGB 505
>> TER     437      GLU
56                                              1PGB 506
>> HETATM  438  O   HOH    57      12.132   8.422  11.247  1.00
8.87      1PGB 507
>>
>> 1pgb CIF:
>>
>> and
>>
>> ATOM 436 O 'O''' . GLU A 1 56 ? 6.410 6.617 4.667 1.00 24.74 ? ? ? ? ? 56
>> GLU A OXT 1
>> HETATM 437 O O     . HOH B 2 .  ? 12.132 8.422  11.247 1.00 8.87
? ? ? ? ? 57
>> HOH ? O   1
>>
>>
>> #
>> loop_
>> _struct_asym.id
>> _struct_asym.pdbx_blank_PDB_chainid_flag
>> _struct_asym.pdbx_modified
>> _struct_asym.entity_id
>> _struct_asym.details
>> A Y N 1 ?
>> B N N 2 ?
>> #
>>
>> So I see why there is a Y for A -- that was blank in the PDB file.
>> But why are the Ns there for B? The atom site field has been changed to B.

>
>
> We only added chain ID for polymer.


>
>>
>> Another issue, this time with both CIF and PDB:
>>
>> SHEET    1 S1  4 LEU    12  ALA    20
0                                1PGB  57
>> SHEET 2 S1 4 MET 1 GLY 9 -1 1PGB 58
>> SHEET    3 S1  4 LYS    50  GLU    56
1                                1PGB  59
>> SHEET 4 S1 4 GLU 42 ASP 46 -1 1PGB 60 >> TURN 1 T1 GLY 9 LEU 12 H-BOND ABSENT 9-12 1PGB 61 >> TURN 2 T2 ASP 47 LYS 50 LYS 50 IN L-HELIX CONFORMATION 1PGB 62
>>
>>
>> Note the duplication of 9 and of 50. How can a residue be in both a sheet and a turn? Is that a mistake? Is it common? This becomes an issue for us because we have to display one or the other; we don't have a way of a group being both in a turn and in a beta-pleated sheet. But this leads to differences in structure for PDB and mmCIF, just because of the order with which these are defined -- SHEET,TURN in this PDB; TURN,SHEET in mmCIF. What do you recommend?
>
>
> I think it has to do with which program you used to calculate secondary structures. I think it's possible for terminal residues in both secondary structures units although it's not very common.
>
>>
>> Finally, if you can help me out on understanding the interrelated roles of asym_id, label_entity_id, auth_asym_id, and similar for seq_id, I would really appreciate it.
>
>
> _atom_site.label_atom_id, _atom_site.label_comp_id, _atom_site.label_asym_id, _atom_site.label_seq_id, _atom_site.label_entity_id are items used to define cif nomenclature. For atom name (_atom_site.label_atom_id), we use IUPAC nomenclature. The residue name (_atom_site.label_comp_id) should be same as PDB nomenclature. The sequence number (always start 1) and asym ID (always start 'A') are automatically generate by program.
>
> _atom_site.auth_atom_id, _atom_site.auth_comp_id, _atom_site.auth_asym_id, _atom_site.auth_seq_id are used to define PDB nomenclature. In most cases, it should match with PDB files.

OK, but we have two cases above where it does not. My question then is whether this is a mmCIF error or intentional. What exactly does it mean when these fields do not match the PDB file?


In these cases, they are intentional. Currently we are working on a
project to cleanup all entries in PDB archive. The goal of this project
is to fix as much errors as we can and make files more consistent across
whole archive. We hope we'll have a new set of mmCIF files (less error,
but not error free) by the end of year. There will be lots of
differences between mmCIF and PDB files. It could have some mistakes
during conversion. Overall mmCIF files definitely will be better than
original PDB files.




>
>
>>
>> Bob Hanson
>> Jmol Development Team
>> [EMAIL PROTECTED]
>>
>>
>
>
>





--

Robert M. Hanson, [EMAIL PROTECTED], 507-646-3107
Professor of Chemistry, St. Olaf College
1520 St. Olaf Ave., Northfield, MN 55057
mailto:[EMAIL PROTECTED]
http://www.stolaf.edu/people/hansonr

"Imagination is more important than knowledge."  - Albert Einstein



/* - - - - - - - - - - - - - - - - - - - - - - - - - - -
Eric Martz, Professor Emeritus, Dept Microbiology
U Mass, Amherst -- http://www.umass.edu/molvis/martz

Biochem 3D Education Resources http://MolviZ.org
See 3D Molecules, Install Nothing! - http://firstglance.jmol.org
Protein Explorer - 3D Visualization: http://proteinexplorer.org
Workshops: http://workshops.proteinexplorer.org
World Index of Molecular Visualization Resources: http://molvisindex.org
ConSurf - Find Conserved Patches in Proteins: http://consurf.tau.ac.il
Atlas of Macromolecules: http://atlas.proteinexplorer.org
PDB Lite Macromolecule Finder: http://pdblite.org
Molecular Visualization EMail List (molvis-list):
      http://bioinformatics.org/mailman/listinfo/molvis-list
- - - - - - - - - - - - - - - - - - - - - - - - - - - */





-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jmol-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jmol-developers

--

Robert M. Hanson, [EMAIL PROTECTED], 507-646-3107
Professor of Chemistry, St. Olaf College
1520 St. Olaf Ave., Northfield, MN 55057
mailto:[EMAIL PROTECTED]
http://www.stolaf.edu/people/hansonr

"Imagination is more important than knowledge."  - Albert Einstein


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jmol-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jmol-developers

Reply via email to