Re: [ccp4bb] a dinosaur asks ... PDB format query
The old BNL PDB format documents are available at https://www.wwpdb.org/documentation/file-format I looked at the Feb 1992 document to be certain that my memory was correct [I am 6 days shy of 82 and am reassured that my recollection was,in fact, correct] for information about the representation of hydrogen naming as we had to violate the rule of two characters right-adjusted for the chemical symbol and two following location characters left-adjusted as we would have needed three location characters for hydrogens. I then looked at 2MB5 which was deposited in 1989 and has hydrogens. At BNL we would have put, e.g., 1HD1, 2HD1, 3HD1 for leucine hydrogens and the current version of the entry has HD11, HD12, HD13. And the redone representation of HEM is totally confusing to me, Given the revisions made by the RCSB PDB it makes sense to use the element type and not the atom name. Frances Bernstein On 2024-05-15 11:34, Harry Powell wrote: Hi Robbie I’m not actually using PDB files of proteins - I’m using the PDB format files in PDBeChem, because at the moment I’m interested in doing stuff with ligands/substrates/etc. The charges I’ve seen so far seem to be not quite what I’d expect, but I’m prepared to work around that. Harry On 15 May 2024, at 16:24, Robbie Joosten wrote: Hi Harry, It might be better now, but there used to positively charged aspartates in the PDB. You have a better chance taking charges out of the CCD for your atoms of interest. I'm not saying all charges in the CCD are correct, but they are much more reliable. If you find errors, please report them to the proper authority. See it, say it, sorted. Cheers, Robbie On 15 May 2024 14:41, Harry Powell <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote: Hi This is very, very useful and hits on the four-letter name problem that I am encountering - thank you. Saves me trying to produce a new design for a circular object with an axle… For the files that I am trying to use, columns 77-78 are present (actually, columns 79-80 are there so I can read the atomic charge as well, which is useful for my purposes) so I’m hoping that this will be reliable. Harry On 15 May 2024, at 12:38, Marcin Wojdyr wrote: • Alignment of one-letter atom name such as C starts at column 14, while two-letter atom name such as FE starts at column 13. indicating a rule does exist. There are programs that don't read/write the element from columns 77-78, so this rule still matters, but using it is less reliable, as Robbie wrote. After I wrote a function that reads pdb files for gemmi, over the next few years I received feedback about cases in which the element columns are absent and the element determined from the atom name is incorrect. The problem is primarily with 4-character atom names that can't be aligned, because they use all the four columns anyway. I added such comments to the code [1] when trying to get it right: // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost) // never have 4-character names, so H is assumed. // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and Ds. // Only Dysprosium is present in the PDB - in a single entry as of 2022. // Old versions of the PDB format had hydrogen names such as "1HB ". // Some MD files use similar names for other elements ("1C4A" -> C). // ... or it can be "C210" [1] https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302 To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
It says in aforementioned docs: "Alignment of one-letter atom name such as C starts at column 14, while two-letter atom name such as FE starts at column 13. " and no, hopefully they don't mean that, since their example shows plenty of 2-letter and 3-letter atom names starting at column 14 in the Example section below which directly contradicts that statement. They mean one-letter and two-letter element names, where possible, but as previously discussed there are many atom names that don't fit that model. Plus, since they define element names elsewhere they perhaps don't want to conflate this data. PDB's own format definition is both incorrect and confusing. Sadly I couldn't find PDB format v2 definitions, to see if the description changed. Phil (Column names starting at 1, I'm having a brief moment of Fortran nostalgia) On 5/15/24 2:16 PM, Paul Emsley wrote: On 15/05/2024 18:45, Filipe Maia wrote: CAUTION: This email originated from outside of the LMB: *.-owner-ccp...@jiscmail.ac.uk-.* Do not click links or open attachments unless you recognize the sender and know the content is safe. If you think this is a phishing email, please forward it to phish...@mrc-lmb.cam.ac.uk -- It is, I think you would agree, unconventional to put a CA label for a main-chain carbon at positions 13 and 14 (I have never seen such a thing). But is it wrong ("Incorrect" - as Harry labels it)? In this case, putting "CHA" in positions 13-15 is unconventional (again, I have never seen such a thing) - but is it wrong? The official PDB documentation, according to my reading at least, is not clear. As Harry pointed out the documentation at https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM says in the "Details" that it's incorrect. Maybe I am being dense, sorry, but could you be more clear about what you mean here? Paul. To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
'CA' for carbon-alpha is a 2-letter atom name so applying the rule exactly as stated it should start in column 13; it states that only 1-letter atom names (C, N, O) go in column 14. This must be a case of poorly-written documentation; it must mean 'element symbol' not 'atom name' in those cases where the 'atom name' begins with the 'element symbol', though as Robbie points out there's no rule that the atom name must begin with the element symbol. COLUMNSDATA TYPEFIELD DEFINITION 13 - 16Atom name Atom name. 77 - 78LString(2)element Element symbol, right-justified. - Alignment of one-letter atom name such as C starts at column 14, while two-letter atom name such as FE starts at column 13. I. On Wed, May 15, 2024 at 11:56 AM Harry Powell < 193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote: > Sorry - just read that > > > • Alignment of one-letter atom name such as C starts at column 14, > while two-letter atom name such as FE starts at column 13. > > indicating a rule does exist. > > Harry > > > On 15 May 2024, at 11:54, Harry Powell < > 193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote: > > > > Hi Ezra > > > > Thanks for this. > > > > In other words, would it be true to say that there are no actual rules > about what appears in columns 13-16 because “it's a rose by any other name”? > > > > Harry > > > >> On 15 May 2024, at 11:38, Ezra Peisach wrote: > >> > >> If you take a look at > https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM > >> > >> you will see the following: > >> > >> 77 - 78LString(2)element Element symbol, > right-justified. > >> > >> Going by atom name will get you in trouble. As you stated calcium vs > Calpha. The element symbol comes from the chemical component dictionary. > >> > >> > >> Ezra > >> > >> > >> > >> On 5/15/24 6:28 AM, Harry Powell wrote: > >>> Hi folks > >>> > >>> I’m sure that this has been answered many times before (I’m sure that > when I was young I even read it here…), and I *know* that we should all be > using mmCIF, but I’m using PDB format files generated by a popular Python > module and I wanted to check the output against a definitive format > definition (if that’s not tautology). > >>> > >>> I noticed this because I was encouraged to try Moorhen and found that > a HEM (apparently written by this module) did not have the atoms connected > with bonds in the display. > >>> > >>> I’m particularly interested in metal atoms here, and want to be 100% > sure that I’ve found a calcium, say, and not a C-alpha. > >>> > >>> Q: Is it necessary to check columns 77-78 if I really want to be sure? > >>> > >>> I’ve read the following, but can’t see anything obvious in “official” > PDB documentation that what it says here is actually defined anywhere: > >>> > Atom names are composed of an atomic (element) symbol right-justified > in columns 13-14, and trailing identifying characters left-justified in > columns 15-16. A single-character element symbol should not appear in > column 13 unless the atom name has four characters (for example, see > Hydrogen Atoms). Many programs simply left-justify all atom names starting > in column 13. The difference can be seen clearly in a short segment of > hemoglobin (entry 3hhb): > > Correct: > HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 > FE > HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 >C > HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 >C > HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 >C > HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 >C > > Incorrect: > HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 > FE > HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 >C > HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 >C > HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 >C > HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 >C > >>> I’m sure that someone here will say “why don’t you look at *, it’s > obvious”, in which case - many thanks! > >>> > >>> help > >>> > >>> Harry > >>> > >>> > > >>> > >>> To unsubscribe from the CCP4BB list, click the following link: > >>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 > >>> > >>> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a > mailing list hosted by www.jiscmail.ac.uk, terms & conditions are > available at https://www.jiscmail.ac.uk/policyandsecurity/ > > > > > > > > To unsubscribe from the CCP4BB list, click the following link: > >
Re: [ccp4bb] a dinosaur asks ... PDB format query
On 15/05/2024 18:45, Filipe Maia wrote: CAUTION: This email originated from outside of the LMB: *.-owner-ccp...@jiscmail.ac.uk-.* Do not click links or open attachments unless you recognize the sender and know the content is safe. If you think this is a phishing email, please forward it to phish...@mrc-lmb.cam.ac.uk -- It is, I think you would agree, unconventional to put a CA label for a main-chain carbon at positions 13 and 14 (I have never seen such a thing). But is it wrong ("Incorrect" - as Harry labels it)? In this case, putting "CHA" in positions 13-15 is unconventional (again, I have never seen such a thing) - but is it wrong? The official PDB documentation, according to my reading at least, is not clear. As Harry pointed out the documentation at https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM says in the "Details" that it's incorrect. Maybe I am being dense, sorry, but could you be more clear about what you mean here? Paul. To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
It is, I think you would agree, unconventional to put a CA label for a main-chain carbon at positions 13 and 14 (I have never seen such a thing). But is it wrong ("Incorrect" - as Harry labels it)? In this case, putting "CHA" in positions 13-15 is unconventional (again, I have never seen such a thing) - but is it wrong? The official PDB documentation, according to my reading at least, is not clear. As Harry pointed out the documentation at https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM says in the "Details" that it's incorrect. Cheers, Filipe När du har kontakt med oss på Uppsala universitet med e-post så innebär det att vi behandlar dina personuppgifter. För att läsa mer om hur vi gör det kan du läsa här: http://www.uu.se/om-uu/dataskydd-personuppgifter/ E-mailing Uppsala University means that we will process your personal data. For more information on how this is performed, please read here: http://www.uu.se/en/about-uu/data-protection-policy To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
Hi Robbie I’m not actually using PDB files of proteins - I’m using the PDB format files in PDBeChem, because at the moment I’m interested in doing stuff with ligands/substrates/etc. The charges I’ve seen so far seem to be not quite what I’d expect, but I’m prepared to work around that. Harry > On 15 May 2024, at 16:24, Robbie Joosten wrote: > > Hi Harry, > > It might be better now, but there used to positively charged aspartates in > the PDB. You have a better chance taking charges out of the CCD for your > atoms of interest. I'm not saying all charges in the CCD are correct, but > they are much more reliable. If you find errors, please report them to the > proper authority. See it, say it, sorted. > > Cheers, > Robbie > > On 15 May 2024 14:41, Harry Powell > <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote: > Hi > > This is very, very useful and hits on the four-letter name problem that I am > encountering - thank you. Saves me trying to produce a new design for a > circular object with an axle… > > For the files that I am trying to use, columns 77-78 are present (actually, > columns 79-80 are there so I can read the atomic charge as well, which is > useful for my purposes) so I’m hoping that this will be reliable. > > Harry > > >> On 15 May 2024, at 12:38, Marcin Wojdyr wrote: >> >>> • Alignment of one-letter atom name such as C starts at column 14, while two-letter atom name such as FE starts at column 13. >>> >>> indicating a rule does exist. >> >> There are programs that don't read/write the element from columns >> 77-78, so this rule still matters, but using it is less reliable, as >> Robbie wrote. After I wrote a function that reads pdb files for gemmi, >> over the next few years I received feedback about cases in which the >> element columns are absent and the element determined from the atom >> name is incorrect. The problem is primarily with 4-character atom >> names that can't be aligned, because they use all the four columns >> anyway. I added such comments to the code [1] when trying to get it >> right: >> >> // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost) >> // never have 4-character names, so H is assumed. >> >> // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and Ds. >> // Only Dysprosium is present in the PDB - in a single entry as of 2022. >> >> // Old versions of the PDB format had hydrogen names such as "1HB ". >> // Some MD files use similar names for other elements ("1C4A" -> C). >> >> // ... or it can be "C210" >> >> [1] >> https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302 >> > > > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 > > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing > list hosted by www.jiscmail.ac.uk, terms & conditions are available at > https://www.jiscmail.ac.uk/policyandsecurity/ > > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 > To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
On 15/05/2024 11:28, Harry Powell wrote: Hi folks [...] I noticed this because I was encouraged to try Moorhen and found that a HEM (apparently written by this module) did not have the atoms connected with bonds in the display. Q: Is it necessary to check columns 77-78 if I really want to be sure? I’ve read the following, but can’t see anything obvious in “official” PDB documentation that what it says here is actually defined anywhere: Atom names are composed of an atomic (element) symbol right-justified in columns 13-14, and trailing identifying characters left-justified in columns 15-16. A single-character element symbol should not appear in column 13 unless the atom name has four characters (for example, see Hydrogen Atoms). Many programs simply left-justify all atom names starting in column 13. The difference can be seen clearly in a short segment of hemoglobin (entry 3hhb): Correct: HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C Incorrect: HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C I have a different slant on this - "is there anything that I need to do to fix the parsing of the above file?" - or to put it another way, "Who's wrong? Moorhen or this file?" It is, I think you would agree, unconventional to put a CA label for a main-chain carbon at positions 13 and 14 (I have never seen such a thing). But is it wrong ("Incorrect" - as Harry labels it)? In this case, putting "CHA" in positions 13-15 is unconventional (again, I have never seen such a thing) - but is it wrong? The official PDB documentation, according to my reading at least, is not clear. Regards, Paul. To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
Hi Harry,It might be better now, but there used to positively charged aspartates in the PDB. You have a better chance taking charges out of the CCD for your atoms of interest. I'm not saying all charges in the CCD are correct, but they are much more reliable. If you find errors, please report them to the proper authority. See it, say it, sorted.Cheers,RobbieOn 15 May 2024 14:41, Harry Powell <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:Hi This is very, very useful and hits on the four-letter name problem that I am encountering - thank you. Saves me trying to produce a new design for a circular object with an axle… For the files that I am trying to use, columns 77-78 are present (actually, columns 79-80 are there so I can read the atomic charge as well, which is useful for my purposes) so I’m hoping that this will be reliable. Harry > On 15 May 2024, at 12:38, Marcin Wojdyr wrote: > >> >>> • Alignment of one-letter atom name such as C starts at column 14, while two-letter atom name such as FE starts at column 13. >> >> indicating a rule does exist. > > There are programs that don't read/write the element from columns > 77-78, so this rule still matters, but using it is less reliable, as > Robbie wrote. After I wrote a function that reads pdb files for gemmi, > over the next few years I received feedback about cases in which the > element columns are absent and the element determined from the atom > name is incorrect. The problem is primarily with 4-character atom > names that can't be aligned, because they use all the four columns > anyway. I added such comments to the code [1] when trying to get it > right: > > // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost) > // never have 4-character names, so H is assumed. > > // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and Ds. > // Only Dysprosium is present in the PDB - in a single entry as of 2022. > > // Old versions of the PDB format had hydrogen names such as "1HB ". > // Some MD files use similar names for other elements ("1C4A" -> C). > > // ... or it can be "C210" > > [1] https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302 To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
Re: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query
> It sounds as though you need the power of the script. You can (from memory) > run pdbcur to drop the aniso lines and hydrogen atoms, which helps. Or from command-line: gemmi convert --anisou=no --remove-h in.pdb out.pdb > You could probably get it to delete everything except CA's too. this would be: gemmi convert --select='CA[C]' --anisou=no --minimal in.pdb out.pdb (--minimal drops REMARKs and other metadata) To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query
Dear Jon, If I understand your question right, I would use Gemmi for this purpose: https://gemmi.readthedocs.io/en/latest/mol.html https://gemmi.readthedocs.io/en/latest/analysis.html It's not in GUI, it involves scripting in Python. It's a very powerful tool and capable of working with both PDB and mmCIF formats and with both proteins and nucleic acids. Cheers, Martin On 15/05/2024 13:11, Hughes, Jonathan wrote: hello CCP4 people, rather off-topic: is there a purpose-written windows editor for PDF files? with interleaved anisotropy lines, missing column delimiters etc., simply extracting the B-factors for Ca atoms is hard work using a standard character editor. would anyone think of working with DNA without proper tools? best jon -- Prof. Dr. Jon Hughes Department of Physics Free University of Berlin & Institute for Plant Physiology Justus Liebig University Giessen Germany To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
For those wanting a text editor solution, there is the purpose built pdb-mode plugin for (x)emacs which works under most operating systems. The download location has moved around a bit, but a current version is available from https://github.com/mmagnus/emacs-pdb-mode/ with more details at https://bondxray.org/software/pdb-mode/ Hope this helps, Andy Purkiss From: CCP4 bulletin board on behalf of Hughes, Jonathan Sent: 15 May 2024 13:11 To: CCP4BB@JISCMAIL.AC.UK Subject: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query External Sender: Use caution. hello CCP4 people, rather off-topic: is there a purpose-written windows editor for PDF files? with interleaved anisotropy lines, missing column delimiters etc., simply extracting the B-factors for Ca atoms is hard work using a standard character editor. would anyone think of working with DNA without proper tools? best jon -- Prof. Dr. Jon Hughes Department of Physics Free University of Berlin & Institute for Plant Physiology Justus Liebig University Giessen Germany To unsubscribe from the CCP4BB list, click the following link: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fcgi-bin%2FWA-JISC.exe%3FSUBED1%3DCCP4BB%26A%3D1=05%7C02%7C%7Ca0be15bf06f0403d2ac208dc74d8241c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C638513718865966750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=VDWcKQcT%2Bs9M9t6kNGAUPF7n6vGIPZdED2GBJByKj74%3D=0<https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1> This message was issued to members of https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2FCCP4BB=05%7C02%7C%7Ca0be15bf06f0403d2ac208dc74d8241c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C638513718865978077%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=CiBNTZ4C160Pp0GpKYayWovGvUlaumjrVAie25SxJSs%3D=0<http://www.jiscmail.ac.uk/CCP4BB>, a mailing list hosted by https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.jiscmail.ac.uk%2F=05%7C02%7C%7Ca0be15bf06f0403d2ac208dc74d8241c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C638513718865985978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=GjlgfOKEdgVMc61TcQjKfoDaHt%2BTsK5IQ2MeV8T%2FybE%3D=0<http://www.jiscmail.ac.uk/>, terms & conditions are available at https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jiscmail.ac.uk%2Fpolicyandsecurity%2F=05%7C02%7C%7Ca0be15bf06f0403d2ac208dc74d8241c%7C4eed7807ebad415aa7a99170947f4eae%7C0%7C1%7C638513718865992415%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=2TI4wk48qzZmZ5RNPGYI5GBXh6YgabucIZKoI8UU9iY%3D=0<https://www.jiscmail.ac.uk/policyandsecurity/> The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
Hi This is very, very useful and hits on the four-letter name problem that I am encountering - thank you. Saves me trying to produce a new design for a circular object with an axle… For the files that I am trying to use, columns 77-78 are present (actually, columns 79-80 are there so I can read the atomic charge as well, which is useful for my purposes) so I’m hoping that this will be reliable. Harry > On 15 May 2024, at 12:38, Marcin Wojdyr wrote: > >> >>> • Alignment of one-letter atom name such as C starts at column 14, while >>> two-letter atom name such as FE starts at column 13. >> >> indicating a rule does exist. > > There are programs that don't read/write the element from columns > 77-78, so this rule still matters, but using it is less reliable, as > Robbie wrote. After I wrote a function that reads pdb files for gemmi, > over the next few years I received feedback about cases in which the > element columns are absent and the element determined from the atom > name is incorrect. The problem is primarily with 4-character atom > names that can't be aligned, because they use all the four columns > anyway. I added such comments to the code [1] when trying to get it > right: > > // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost) > // never have 4-character names, so H is assumed. > > // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and Ds. > // Only Dysprosium is present in the PDB - in a single entry as of 2022. > > // Old versions of the PDB format had hydrogen names such as "1HB ". > // Some MD files use similar names for other elements ("1C4A" -> C). > > // ... or it can be "C210" > > [1] > https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302 To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query
You could probably get it to delete everything except CA's too. Best wishes, Jon Cooper. jon.b.coo...@protonmail.com Sent from Proton Mail mobile Original Message On 15 May 2024, 13:22, Jon Cooper wrote: > It sounds as though you need the power of the script. You can (from memory) > run pdbcur to drop the aniso lines and hydrogen atoms, which helps. > > Best wishes, Jon Cooper. jon.b.coo...@protonmail.com > > Sent from Proton Mail mobile > > Original Message > On 15 May 2024, 13:11, Hughes, Jonathan wrote: > >> hello CCP4 people, rather off-topic: is there a purpose-written windows >> editor for PDF files? with interleaved anisotropy lines, missing column >> delimiters etc., simply extracting the B-factors for Ca atoms is hard work >> using a standard character editor. would anyone think of working with DNA >> without proper tools? best jon -- Prof. Dr. Jon Hughes Department of Physics >> Free University of Berlin & Institute for Plant Physiology Justus Liebig >> University Giessen Germany >> To >> unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This >> message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list >> hosted by www.jiscmail.ac.uk, terms & conditions are available at >> https://www.jiscmail.ac.uk/policyandsecurity/ >> >> --- >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query
It sounds as though you need the power of the script. You can (from memory) run pdbcur to drop the aniso lines and hydrogen atoms, which helps. Best wishes, Jon Cooper. jon.b.coo...@protonmail.com Sent from Proton Mail mobile Original Message On 15 May 2024, 13:11, Hughes, Jonathan wrote: > hello CCP4 people, rather off-topic: is there a purpose-written windows > editor for PDF files? with interleaved anisotropy lines, missing column > delimiters etc., simply extracting the B-factors for Ca atoms is hard work > using a standard character editor. would anyone think of working with DNA > without proper tools? best jon -- Prof. Dr. Jon Hughes Department of Physics > Free University of Berlin & Institute for Plant Physiology Justus Liebig > University Giessen Germany > To > unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message > was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by > www.jiscmail.ac.uk, terms & conditions are available at > https://www.jiscmail.ac.uk/policyandsecurity/ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
[ccp4bb] AW: [ccp4bb] a dinosaur asks ... PDB format query
hello CCP4 people, rather off-topic: is there a purpose-written windows editor for PDF files? with interleaved anisotropy lines, missing column delimiters etc., simply extracting the B-factors for Ca atoms is hard work using a standard character editor. would anyone think of working with DNA without proper tools? best jon -- Prof. Dr. Jon Hughes Department of Physics Free University of Berlin & Institute for Plant Physiology Justus Liebig University Giessen Germany To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
It would also be good to check the monomer library (expanded with any user-supplied dictionaries). Cases where an element in columns 77-78 exists and it does not agree with the component definition should probably be flagged up. Cheers, Paul On Wed, 15 May 2024 at 12:39, Marcin Wojdyr wrote: > > > > > • Alignment of one-letter atom name such as C starts at column 14, > while two-letter atom name such as FE starts at column 13. > > > > indicating a rule does exist. > > There are programs that don't read/write the element from columns > 77-78, so this rule still matters, but using it is less reliable, as > Robbie wrote. After I wrote a function that reads pdb files for gemmi, > over the next few years I received feedback about cases in which the > element columns are absent and the element determined from the atom > name is incorrect. The problem is primarily with 4-character atom > names that can't be aligned, because they use all the four columns > anyway. I added such comments to the code [1] when trying to get it > right: > > // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost) > // never have 4-character names, so H is assumed. > > // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and > Ds. > // Only Dysprosium is present in the PDB - in a single entry as of > 2022. > > // Old versions of the PDB format had hydrogen names such as "1HB ". > // Some MD files use similar names for other elements ("1C4A" -> C). > > // ... or it can be "C210" > > [1] > https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302 > > > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 > > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a > mailing list hosted by www.jiscmail.ac.uk, terms & conditions are > available at https://www.jiscmail.ac.uk/policyandsecurity/ > To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
> > > • Alignment of one-letter atom name such as C starts at column 14, while > > two-letter atom name such as FE starts at column 13. > > indicating a rule does exist. There are programs that don't read/write the element from columns 77-78, so this rule still matters, but using it is less reliable, as Robbie wrote. After I wrote a function that reads pdb files for gemmi, over the next few years I received feedback about cases in which the element columns are absent and the element determined from the atom name is incorrect. The problem is primarily with 4-character atom names that can't be aligned, because they use all the four columns anyway. I added such comments to the code [1] when trying to get it right: // Atom names HXXX are ambiguous, but Hg, He, Hf, Ho and Hs (almost) // never have 4-character names, so H is assumed. // Similarly Deuterium (DXXX), but here alternatives are Dy, Db and Ds. // Only Dysprosium is present in the PDB - in a single entry as of 2022. // Old versions of the PDB format had hydrogen names such as "1HB ". // Some MD files use similar names for other elements ("1C4A" -> C). // ... or it can be "C210" [1] https://github.com/project-gemmi/gemmi/blob/148f37b7c6561c3a255a6a4dd75d6bae888e/include/gemmi/pdb.hpp#L302 To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
Hi Harry,Deducing the element from the atom name has always been unreliable so since PDB version 3 you have to get it from columns 77-78. There is no implied element in the atom name anymore.HTH,RobbieOn 15 May 2024 12:28, Harry Powell <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote:Hi folks I’m sure that this has been answered many times before (I’m sure that when I was young I even read it here…), and I *know* that we should all be using mmCIF, but I’m using PDB format files generated by a popular Python module and I wanted to check the output against a definitive format definition (if that’s not tautology). I noticed this because I was encouraged to try Moorhen and found that a HEM (apparently written by this module) did not have the atoms connected with bonds in the display. I’m particularly interested in metal atoms here, and want to be 100% sure that I’ve found a calcium, say, and not a C-alpha. Q: Is it necessary to check columns 77-78 if I really want to be sure? I’ve read the following, but can’t see anything obvious in “official” PDB documentation that what it says here is actually defined anywhere: > Atom names are composed of an atomic (element) symbol right-justified in columns 13-14, and trailing identifying characters left-justified in columns 15-16. A single-character element symbol should not appear in column 13 unless the atom name has four characters (for example, see Hydrogen Atoms). Many programs simply left-justify all atom names starting in column 13. The difference can be seen clearly in a short segment of hemoglobin (entry 3hhb): > > Correct: > HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE > HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C > HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C > HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C > HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C > > Incorrect: > HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE > HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C > HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C > HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C > HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C I’m sure that someone here will say “why don’t you look at *, it’s obvious”, in which case - many thanks! help Harry To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1
Re: [ccp4bb] a dinosaur asks ... PDB format query
Sorry - just read that > • Alignment of one-letter atom name such as C starts at column 14, while > two-letter atom name such as FE starts at column 13. indicating a rule does exist. Harry > On 15 May 2024, at 11:54, Harry Powell > <193323b1e616-dmarc-requ...@jiscmail.ac.uk> wrote: > > Hi Ezra > > Thanks for this. > > In other words, would it be true to say that there are no actual rules about > what appears in columns 13-16 because “it's a rose by any other name”? > > Harry > >> On 15 May 2024, at 11:38, Ezra Peisach wrote: >> >> If you take a look at >> https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM >> >> you will see the following: >> >> 77 - 78LString(2)element Element symbol, right-justified. >> >> Going by atom name will get you in trouble. As you stated calcium vs >> Calpha. The element symbol comes from the chemical component dictionary. >> >> >> Ezra >> >> >> >> On 5/15/24 6:28 AM, Harry Powell wrote: >>> Hi folks >>> >>> I’m sure that this has been answered many times before (I’m sure that when >>> I was young I even read it here…), and I *know* that we should all be using >>> mmCIF, but I’m using PDB format files generated by a popular Python module >>> and I wanted to check the output against a definitive format definition (if >>> that’s not tautology). >>> >>> I noticed this because I was encouraged to try Moorhen and found that a HEM >>> (apparently written by this module) did not have the atoms connected with >>> bonds in the display. >>> >>> I’m particularly interested in metal atoms here, and want to be 100% sure >>> that I’ve found a calcium, say, and not a C-alpha. >>> >>> Q: Is it necessary to check columns 77-78 if I really want to be sure? >>> >>> I’ve read the following, but can’t see anything obvious in “official” PDB >>> documentation that what it says here is actually defined anywhere: >>> Atom names are composed of an atomic (element) symbol right-justified in columns 13-14, and trailing identifying characters left-justified in columns 15-16. A single-character element symbol should not appear in column 13 unless the atom name has four characters (for example, see Hydrogen Atoms). Many programs simply left-justify all atom names starting in column 13. The difference can be seen clearly in a short segment of hemoglobin (entry 3hhb): Correct: HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C Incorrect: HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C >>> I’m sure that someone here will say “why don’t you look at *, it’s >>> obvious”, in which case - many thanks! >>> >>> help >>> >>> Harry >>> >>> >>> >>> To unsubscribe from the CCP4BB list, click the following link: >>> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 >>> >>> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing >>> list hosted by www.jiscmail.ac.uk, terms & conditions are available at >>> https://www.jiscmail.ac.uk/policyandsecurity/ > > > > To unsubscribe from the CCP4BB list, click the following link: > https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 > > This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing > list hosted by www.jiscmail.ac.uk, terms & conditions are available at > https://www.jiscmail.ac.uk/policyandsecurity/ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
Hi Ezra Thanks for this. In other words, would it be true to say that there are no actual rules about what appears in columns 13-16 because “it's a rose by any other name”? Harry > On 15 May 2024, at 11:38, Ezra Peisach wrote: > > If you take a look at > https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM > > you will see the following: > > 77 - 78LString(2)element Element symbol, right-justified. > > Going by atom name will get you in trouble. As you stated calcium vs Calpha. > The element symbol comes from the chemical component dictionary. > > > Ezra > > > > On 5/15/24 6:28 AM, Harry Powell wrote: >> Hi folks >> >> I’m sure that this has been answered many times before (I’m sure that when I >> was young I even read it here…), and I *know* that we should all be using >> mmCIF, but I’m using PDB format files generated by a popular Python module >> and I wanted to check the output against a definitive format definition (if >> that’s not tautology). >> >> I noticed this because I was encouraged to try Moorhen and found that a HEM >> (apparently written by this module) did not have the atoms connected with >> bonds in the display. >> >> I’m particularly interested in metal atoms here, and want to be 100% sure >> that I’ve found a calcium, say, and not a C-alpha. >> >> Q: Is it necessary to check columns 77-78 if I really want to be sure? >> >> I’ve read the following, but can’t see anything obvious in “official” PDB >> documentation that what it says here is actually defined anywhere: >> >>> Atom names are composed of an atomic (element) symbol right-justified in >>> columns 13-14, and trailing identifying characters left-justified in >>> columns 15-16. A single-character element symbol should not appear in >>> column 13 unless the atom name has four characters (for example, see >>> Hydrogen Atoms). Many programs simply left-justify all atom names starting >>> in column 13. The difference can be seen clearly in a short segment of >>> hemoglobin (entry 3hhb): >>> >>> Correct: >>> HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 >>> FE >>> HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 >>> C >>> HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 >>> C >>> HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 >>> C >>> HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 >>> C >>> >>> Incorrect: >>> HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 >>> FE >>> HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 >>> C >>> HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 >>> C >>> HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 >>> C >>> HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 >>> C >> I’m sure that someone here will say “why don’t you look at *, it’s >> obvious”, in which case - many thanks! >> >> help >> >> Harry >> >> >> >> To unsubscribe from the CCP4BB list, click the following link: >> https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 >> >> This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing >> list hosted by www.jiscmail.ac.uk, terms & conditions are available at >> https://www.jiscmail.ac.uk/policyandsecurity/ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
Re: [ccp4bb] a dinosaur asks ... PDB format query
If you take a look at https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM you will see the following: 77 - 78 LString(2) element Element symbol, right-justified. Going by atom name will get you in trouble. As you stated calcium vs Calpha. The element symbol comes from the chemical component dictionary. Ezra On 5/15/24 6:28 AM, Harry Powell wrote: Hi folks I’m sure that this has been answered many times before (I’m sure that when I was young I even read it here…), and I *know* that we should all be using mmCIF, but I’m using PDB format files generated by a popular Python module and I wanted to check the output against a definitive format definition (if that’s not tautology). I noticed this because I was encouraged to try Moorhen and found that a HEM (apparently written by this module) did not have the atoms connected with bonds in the display. I’m particularly interested in metal atoms here, and want to be 100% sure that I’ve found a calcium, say, and not a C-alpha. Q: Is it necessary to check columns 77-78 if I really want to be sure? I’ve read the following, but can’t see anything obvious in “official” PDB documentation that what it says here is actually defined anywhere: Atom names are composed of an atomic (element) symbol right-justified in columns 13-14, and trailing identifying characters left-justified in columns 15-16. A single-character element symbol should not appear in column 13 unless the atom name has four characters (for example, see Hydrogen Atoms). Many programs simply left-justify all atom names starting in column 13. The difference can be seen clearly in a short segment of hemoglobin (entry 3hhb): Correct: HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C Incorrect: HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C I’m sure that someone here will say “why don’t you look at *, it’s obvious”, in which case - many thanks! help Harry To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/ To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
[ccp4bb] a dinosaur asks ... PDB format query
Hi folks I’m sure that this has been answered many times before (I’m sure that when I was young I even read it here…), and I *know* that we should all be using mmCIF, but I’m using PDB format files generated by a popular Python module and I wanted to check the output against a definitive format definition (if that’s not tautology). I noticed this because I was encouraged to try Moorhen and found that a HEM (apparently written by this module) did not have the atoms connected with bonds in the display. I’m particularly interested in metal atoms here, and want to be 100% sure that I’ve found a calcium, say, and not a C-alpha. Q: Is it necessary to check columns 77-78 if I really want to be sure? I’ve read the following, but can’t see anything obvious in “official” PDB documentation that what it says here is actually defined anywhere: > Atom names are composed of an atomic (element) symbol right-justified in > columns 13-14, and trailing identifying characters left-justified in columns > 15-16. A single-character element symbol should not appear in column 13 > unless the atom name has four characters (for example, see Hydrogen Atoms). > Many programs simply left-justify all atom names starting in column 13. The > difference can be seen clearly in a short segment of hemoglobin (entry 3hhb): > > Correct: > HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE > HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C > HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C > HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C > HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C > > Incorrect: > HETATM 1071 FE HEM A 1 8.128 7.371 -15.022 24.00 16.74 FE > HETATM 1072 CHA HEM A 1 8.617 7.879 -18.361 6.00 17.74 C > HETATM 1073 CHB HEM A 1 10.356 10.005 -14.319 6.00 18.92 C > HETATM 1074 CHC HEM A 1 8.307 6.456 -11.669 6.00 11.00 C > HETATM 1075 CHD HEM A 1 6.928 4.145 -15.725 6.00 13.25 C I’m sure that someone here will say “why don’t you look at *, it’s obvious”, in which case - many thanks! help Harry To unsubscribe from the CCP4BB list, click the following link: https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB=1 This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/