OK, turns out there isn't any problem with this in the prototype CIF
reader.
Both 10.2 and 10.x read PDB files identically with this '0' to ' '
change.
Eric, both trouble makers 1bkx and 1d66 and several others read
identically in CIF and PDB.
Some files (1pgb) read differently CIF and PDB intentionally -- RCSB
has redefined or "corrected" the files in terms of what's an atom,
hetatm, etc, and what can have a chain designation (not heteroatoms).
Some files (1pgb) read differently CIF and PDB because of the order in
which TURNS and SHEETS are defined in the file, and it is possible
for a SINGLE group to be labeled both a TURN and a SHEET.
So, the bottom line I think (sorry Egon) is that the very complex code
added recently to the CIF reader in order to align it with the PDB
reader is unnecessary. All it really does is provied a complex path to
atom_site.auth_asym_id. The advise I got, "Just always read
atom_site.auth_asym_id," appears correct until proven otherwise.
We can keep testing....
Bob
Bob Hanson wrote:
Eric, Miguel, et al.,
I just uploaded the following fix:
changes '0' to ' ' in CIF reader to make it read consistent with PDB reader
when no chain is indicated. To select atoms without a chain, simply
select *:
(as when the PDB file is read currently)
Trunk is ready for (monthly? perhaps?) integration into 10.2.
Right now the cif readers in the 10.x prototype and 10.2 are quite
different in how they approach the reading of mmCIF files. This is
because Egon and I have received two different messages from the RCSB
group, and because I haven't fixed 10.x (x for "experimental", by the
way) based on what I received. That's interesting, I think, and worth
exploring. So I plan on introduce some fixes into the prototype only
(bob200603) that will match what I was told. Then we can compare and see
what works.
Bob
Eric Martz wrote:
Dear Bob,
These issues are complicated and I'm afraid I don't have time now to
think carefully about them. I'm sure I'll get back to mmCIF sometime
later and then you'll hear from me.
When I was looking at Jmol's reading of mmCIF files, it appeared to me
that everything that didn't have a chain was assigned to chain 0,
including water and ligands. This does not seem useful and indeed
would lead to lots of problems, I think.
When a file has a single chain that is not named by the authors, I
think it would be OK to assign that chain the name "0". That might be
useful. But only residues (amino acids or nucleotides) that are
covalently bonded into the chain should be members.
I appreciate the work you are doing on Jmol very much!
-Eric
At 4/19/06, you wrote:
Eric, I'm forwarding this snippet to you; pardon me if it's
unreadable. Basically Jmol is doing the correct analysis. 1bkx has a
"correction" to the PDB in the CIF that I think you are going to see
generally. Primarily, here are some of the issues with those files
you mention:
1bkx
HETATM records can't have chain IDs
TPO group now classified as ATOM, not HETATM
A group now classified as HETATM, not ATOM
1d66
no issues
1pgb
HETATM records can't have chain IDs
Three questions:
1. That sound about right?
2. Are you OK with the idea that the CIF files are not going to read
the same as the PDB files?
3. How do you feel about this assigning of "0" to chain IDs that were
blank in the PDB files? (I can't figure out why that is being done.
I'll ask Egon.)
-------- Original Message --------
Subject: Re: two mmCIF issues
Date: Tue, 18 Apr 2006 23:30:32 -0400
From: Zukang Feng <[EMAIL PROTECTED]>
To: info <[EMAIL PROTECTED]>
CC: Bob Hanson <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>
Hi Bob,
See my comments below.
Rachel Kramer Green wrote:
Sure. You can reach Zukang Feng at [EMAIL PROTECTED]
Rachel
Bob Hanson wrote:
Thank you, Rachel, for the reply. Is it OK to contact this person
directly?
I have some comments below:
Rachel Kramer Green wrote:
> Dear Dr. Hanson,
>
> Thank you for your email message.
>
> I forwarded your questions to our resident mmCIF expert and his
comments are entered below.
>
> Please write again with any additional questions you have.
>
> Sincerely,
> Rachel Green
>
>***************************
>Rachel Kramer Green, Ph.D.
>RCSB PDB
>
>[EMAIL PROTECTED]
>***************************
>
>
>
> Bob Hanson wrote:
>
>
>> 1d66 cif and pdb
>>
>> First, is there a mistake in the way the cif generator is
creating HETATM records?
>>
>> In the PDB:
>>
>> ATOM 1710 CD2 LEU B 64 0.597 41.712 31.083 1.00
30.29 1D661797
>> ...
>> HETATM 1715 CD CD 42 33.200 64.497 45.835 1.00
39.60 1D661802
>> HETATM 1716 O HOH 301 40.594 60.277 53.968 1.00
16.15 1D661803
>>
>>
>> and I see in the CIF:
>>
>> ...
>> ATOM 1707 C CD2 . LEU D 3 64 ? 0.597 41.712 31.083 1.00
30.29 ? ? ? ? ?
>> 64 LEU B CD2 1
>> ...
>> HETATM 1711 CD CD . CD H 4 . ? 33.200 64.497 45.835 1.00
39.60 ? ? ? ? ?
>> 42 CD ? CD 1
>> HETATM 1712 O O . HOH I 5 . ? 40.594 60.277 53.968 1.00
16.15 ? ? ? ? ?
>> 301 HOH ? O 1
>>
>>
>
> Always use _atom_site.auth_seq_id, _atom_site.auth_comp_id,
_atom_site.auth_asym_id, _atom_site.auth_atom_id for PDB nomenclature.
>
OK, that's very helpful. Always use auth*....but...
>>
>>
>> :1
cif and pdb
>>
>> Here in the pdb we have:
ATOM 2801 OXT PHE A 350 16.196 56.895 6.121 1.00
79.11 O
TER 2802 PHE A
350 ATOM 2803
O5* A B 351 8.480 41.650 18.848 1.00 32.82 O
>> and in the cif:
>>
>>
>> ATOM 2801 O 'O''' . PHE A 1 350 ? 16.196 56.895 6.121 1.00
79.11 ? ? ? ?
>> ? 350 PHE A OXT 1
>> HETATM 2802 O 'O5'' . A B 2 . ? 8.480 41.650 18.848 1.00
32.82 ? ? ? ?
>> ? 351 A ? O5* 1
>>
>> First, why the change to HETATM? Second, why is that last ?
there in the cif file? The field defs are:
>
>
> Because nucleotide 'A' treated as het group, not the monomer in
polynucleotide chain. The last '?' should be
'_atom_site.auth_asym_id' which means PDB chain ID.
Right, but that is "B" in the PDB files. So is this an error? If we
use the auth_* fields as recommended above,
then we would have "?" not "B". That's what is confusing me.
I see the problem. The original PDB file assigned chain ID 'B' and
SEQRES record to single nucleotide 'A'. That was a mistake in the PDB
file. When we converted PDB file into mmCIF files, we fixed such
problems. There should be no chain ID and SEQRES record for
nucleotide 'A'.
>> Note that we have:
>>
>> #
>> loop_
>> _struct_asym.id
>> _struct_asym.pdbx_blank_PDB_chainid_flag
>> _struct_asym.pdbx_modified
>> _struct_asym.entity_id
>> _struct_asym.details
>> A N N 1 ?
>> B N N 2 ?
>> C N N 3 ?
>> #
>>
>> I would have read that "no changes from author definitions." So
what is this block really telling me?
>
>
> In this case, it tells you there are three asym ids 'A', 'B', 'C'
and it did not mean anything. The most important item in this
category is '_struct_asym.pdbx_blank_PDB_chainid_flag'. If its
value is 'Y', it means in the original PDB file, there is no PDB
chain ID for single chain, but we add chain ID (usually 'A') in
mmCIF file.
When you say "did not mean anything", what do you mean? I thought
it meant "not changed".
You can say that. But the flag only indicates for polymers, not for het
groups.
Why wouldn't _atom_site.auth_asym_id be "B" in this case in the
HETATM record?
I explained above.
When the "N" is here, should I use _atom_site.label_asym_id instead
of _atom_site.auth_asym_id ?
You should always use _atom_site.auth_asym_id for PDB chain IDs.
>
>>
>> Along the same lines, we have 1pgb:
>>
>> 1pgb PDB:
>>
>> ATOM 436 OXT GLU 56 6.410 6.617 4.667 1.00
24.74 1PGB 505
>> TER 437 GLU
56 1PGB 506
>> HETATM 438 O HOH 57 12.132 8.422 11.247 1.00
8.87 1PGB 507
>>
>> 1pgb CIF:
>>
>> and
>>
>> ATOM 436 O 'O''' . GLU A 1 56 ? 6.410 6.617 4.667 1.00
24.74 ? ? ? ? ? 56
>> GLU A OXT 1
>> HETATM 437 O O . HOH B 2 . ? 12.132 8.422 11.247 1.00 8.87
? ? ? ? ? 57
>> HOH ? O 1
>>
>>
>> #
>> loop_
>> _struct_asym.id
>> _struct_asym.pdbx_blank_PDB_chainid_flag
>> _struct_asym.pdbx_modified
>> _struct_asym.entity_id
>> _struct_asym.details
>> A Y N 1 ?
>> B N N 2 ?
>> #
>>
>> So I see why there is a Y for A -- that was blank in the PDB file.
>> But why are the Ns there for B? The atom site field has been
changed to B.
>
>
> We only added chain ID for polymer.
>
>>
>> Another issue, this time with both CIF and PDB:
>>
>> SHEET 1 S1 4 LEU 12 ALA 20
0 1PGB 57
>> SHEET 2 S1 4 MET 1 GLY 9
-1 1PGB 58
>> SHEET 3 S1 4 LYS 50 GLU 56
1 1PGB 59
>> SHEET 4 S1 4 GLU 42 ASP 46
-1 1PGB 60
>> TURN 1 T1 GLY 9 LEU 12 H-BOND ABSENT
9-12 1PGB 61
>> TURN 2 T2 ASP 47 LYS 50 LYS 50 IN L-HELIX
CONFORMATION 1PGB 62
>>
>>
>> Note the duplication of 9 and of 50. How can a residue be in
both a sheet and a turn? Is that a mistake? Is it common? This
becomes an issue for us because we have to display one or the
other; we don't have a way of a group being both in a turn and in a
beta-pleated sheet. But this leads to differences in structure for
PDB and mmCIF, just because of the order with which these are
defined -- SHEET,TURN in this PDB; TURN,SHEET in mmCIF. What do you
recommend?
>
>
> I think it has to do with which program you used to calculate
secondary structures. I think it's possible for terminal residues
in both secondary structures units although it's not very common.
>
>>
>> Finally, if you can help me out on understanding the
interrelated roles of asym_id, label_entity_id, auth_asym_id, and
similar for seq_id, I would really appreciate it.
>
>
> _atom_site.label_atom_id, _atom_site.label_comp_id,
_atom_site.label_asym_id, _atom_site.label_seq_id,
_atom_site.label_entity_id are items used to define cif
nomenclature. For atom name (_atom_site.label_atom_id), we use
IUPAC nomenclature. The residue name (_atom_site.label_comp_id)
should be same as PDB nomenclature. The sequence number (always
start 1) and asym ID (always start 'A') are automatically generate
by program.
>
> _atom_site.auth_atom_id, _atom_site.auth_comp_id,
_atom_site.auth_asym_id, _atom_site.auth_seq_id are used to define
PDB nomenclature. In most cases, it should match with PDB files.
OK, but we have two cases above where it does not. My question then
is whether this is a mmCIF error or intentional. What exactly does
it mean when these fields do not match the PDB file?
In these cases, they are intentional. Currently we are working on a
project to cleanup all entries in PDB archive. The goal of this project
is to fix as much errors as we can and make files more consistent across
whole archive. We hope we'll have a new set of mmCIF files (less error,
but not error free) by the end of year. There will be lots of
differences between mmCIF and PDB files. It could have some mistakes
during conversion. Overall mmCIF files definitely will be better than
original PDB files.
>
>
>>
>> Bob Hanson
>> Jmol Development Team
>> [EMAIL PROTECTED]
>>
>>
>
>
>
--
Robert M. Hanson, [EMAIL PROTECTED], 507-646-3107
Professor of Chemistry, St. Olaf College
1520 St. Olaf Ave., Northfield, MN 55057
mailto:[EMAIL PROTECTED]
http://www.stolaf.edu/people/hansonr
"Imagination is more important than knowledge." - Albert Einstein
/* - - - - - - - - - - - - - - - - - - - - - - - - - - -
Eric Martz, Professor Emeritus, Dept Microbiology
U Mass, Amherst -- http://www.umass.edu/molvis/martz
Biochem 3D Education Resources http://MolviZ.org
See 3D Molecules, Install Nothing! - http://firstglance.jmol.org
Protein Explorer - 3D Visualization: http://proteinexplorer.org
Workshops: http://workshops.proteinexplorer.org
World Index of Molecular Visualization Resources: http://molvisindex.org
ConSurf - Find Conserved Patches in Proteins: http://consurf.tau.ac.il
Atlas of Macromolecules: http://atlas.proteinexplorer.org
PDB Lite Macromolecule Finder: http://pdblite.org
Molecular Visualization EMail List (molvis-list):
http://bioinformatics.org/mailman/listinfo/molvis-list
- - - - - - - - - - - - - - - - - - - - - - - - - - - */
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jmol-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jmol-developers
--
Robert M. Hanson, [EMAIL PROTECTED], 507-646-3107
Professor of Chemistry, St. Olaf College
1520 St. Olaf Ave., Northfield, MN 55057
mailto:[EMAIL PROTECTED]
http://www.stolaf.edu/people/hansonr
"Imagination is more important than knowledge." - Albert Einstein
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Jmol-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jmol-developers