Re: [Rdkit-discuss] Rdkit atom indexing vs indexing in written pdb file
Thank you very much Andrew! Indeed, I did not spot the pattern - how silly of me! From: Andrew Dalke [da...@dalkescientific.com] Sent: 01 February 2017 16:49 To: Susan Leung Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] Rdkit atom indexing vs indexing in written pdb file Dear Susan, If I understand what's going on correctly, you have run across the difference between 0-based and 1-based indexing. See https://en.wikipedia.org/wiki/Zero-based_numbering . RDKit, like most programming libraries and languages, index based on an offset from the beginning, so 0 means the beginning, 1 means one after the beginning, etc. This is somewhat like how some buildings use "1" as the first floor above the ground, while others regard "1" as the ground floor, which is confusing if you are not used to it. (My apartment number says its on the second floor, while the elevator button says I live on floor 3.) On Feb 1, 2017, at 5:15 PM, Susan Leung wrote: > I am producing rdkit conformers and writing them to pdb files but am finding > the atom indexing in rdkit is different from the written pdb. ... > Here is my code and output (the C=O looks like it's atoms 3,4 in rdkit but > 4,5 in the pdb file): ... > In [3]: mol = Chem.MolFromSmiles("CC1=C(C(=O)C)C=CC=C1") ... > In [4]: mol.GetSubstructMatch(Chem.MolFromSmiles('C(=O)')) > Out[4]: (3, 4) ... > record_name atom_number blank_1 atom_name alt_loc residue_name blank_2 \ > 0 HETATM1C1 UNL > 1 HETATM2C2 UNL > 2 HETATM3C3 UNL > 3 HETATM4C4 UNL > 4 HETATM5O1 UNL > 5 HETATM6C5 UNL > 6 HETATM7C6 UNL > 7 HETATM8C7 UNL > 8 HETATM9C8 UNL > 9 HETATM 10C9 UNL If I understand you correctly, then the "(3, 4)" as RDKit atom indices is (3+1, 4+1) = (4,5) as PDB atom number, that is, the RDKit indices correspond to the left-most column of your table, rather than the atom_number column. Cheers, Andrew da...@dalkescientific.com -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Rdkit atom indexing vs indexing in written pdb file
Dear Susan, If I understand what's going on correctly, you have run across the difference between 0-based and 1-based indexing. See https://en.wikipedia.org/wiki/Zero-based_numbering . RDKit, like most programming libraries and languages, index based on an offset from the beginning, so 0 means the beginning, 1 means one after the beginning, etc. This is somewhat like how some buildings use "1" as the first floor above the ground, while others regard "1" as the ground floor, which is confusing if you are not used to it. (My apartment number says its on the second floor, while the elevator button says I live on floor 3.) On Feb 1, 2017, at 5:15 PM, Susan Leung wrote: > I am producing rdkit conformers and writing them to pdb files but am finding > the atom indexing in rdkit is different from the written pdb. ... > Here is my code and output (the C=O looks like it's atoms 3,4 in rdkit but > 4,5 in the pdb file): ... > In [3]: mol = Chem.MolFromSmiles("CC1=C(C(=O)C)C=CC=C1") ... > In [4]: mol.GetSubstructMatch(Chem.MolFromSmiles('C(=O)')) > Out[4]: (3, 4) ... > record_name atom_number blank_1 atom_name alt_loc residue_name blank_2 \ > 0 HETATM1C1 UNL > 1 HETATM2C2 UNL > 2 HETATM3C3 UNL > 3 HETATM4C4 UNL > 4 HETATM5O1 UNL > 5 HETATM6C5 UNL > 6 HETATM7C6 UNL > 7 HETATM8C7 UNL > 8 HETATM9C8 UNL > 9 HETATM 10C9 UNL If I understand you correctly, then the "(3, 4)" as RDKit atom indices is (3+1, 4+1) = (4,5) as PDB atom number, that is, the RDKit indices correspond to the left-most column of your table, rather than the atom_number column. Cheers, Andrew da...@dalkescientific.com -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Rdkit atom indexing vs indexing in written pdb file
Dear all, I am producing rdkit conformers and writing them to pdb files but am finding the atom indexing in rdkit is different from the written pdb. I would like this because I want to do a substructure search (using rdkit) to give me a handle on these atoms in the pdbfile. Apologies if this has been discussed before. Here is my code and output (the C=O looks like it's atoms 3,4 in rdkit but 4,5 in the pdb file): Thanks, Susan * In [1]: import rdkit In [2]: from rdkit import Chem ...: from rdkit.Chem import AllChem ...: from rdkit.Chem.Draw import IPythonConsole ...: In [3]: mol = Chem.MolFromSmiles("CC1=C(C(=O)C)C=CC=C1") ...: idx = AllChem.EmbedMultipleConfs(mol,numConfs=1,randomSeed=0xf00d, ...: useExpTorsionAnglePrefs=True,useBasicKnowledge=True) ...: In [4]: mol.GetSubstructMatch(Chem.MolFromSmiles('C(=O)')) Out[4]: (3, 4) In [5]: Chem.MolToPDBFile(mol,'./test.pdb') In [6]: import biopandas ...: from biopandas.pdb import PandasPDB ...: ppdb = PandasPDB() ...: ppdb.read_pdb('./test.pdb') ...: ppdb.df['HETATM'] ...: Out[6]: record_name atom_number blank_1 atom_name alt_loc residue_name blank_2 \ 0 HETATM1C1 UNL 1 HETATM2C2 UNL 2 HETATM3C3 UNL 3 HETATM4C4 UNL 4 HETATM5O1 UNL 5 HETATM6C5 UNL 6 HETATM7C6 UNL 7 HETATM8C7 UNL 8 HETATM9C8 UNL 9 HETATM 10C9 UNL chain_id residue_number insertion...x_coord y_coord z_coord \ 01 ... 0.1761.9111.137 11 ... -0.5130.7590.511 21 ... 0.272 -0.184 -0.139 31 ... 1.717 -0.056 -0.210 41 ... 2.406 -0.917 -0.801 51 ... 2.3441.1180.435 61 ... -0.332 -1.286 -0.743 71 ... -1.696 -1.416 -0.682 81 ... -2.495 -0.504 -0.048 91 ... -1.8790.5750.540 occupancy b_factor blank_4 segment_id element_symbol charge line_idx 01.0 0.0 CNaN 0 11.0 0.0 CNaN 1 21.0 0.0 CNaN 2 31.0 0.0 CNaN 3 41.0 0.0 ONaN 4 51.0 0.0 CNaN 5 61.0 0.0 CNaN 6 71.0 0.0 CNaN 7 81.0 0.0 CNaN 8 91.0 0.0 CNaN 9 [10 rows x 21 columns] -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss