Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
n't match up with what is expected (The number of SP3-atoms doesn't match up). When parsing through the structure 3got, it fails to recognize the aromatic rings of the ligand A2F. I'm assuming this is caused by RDKit not assigning bond orders correctly when reading in RNA and DNA pdb files (something which I have reported in earlier on this mailing list)? > > > Running hetatm.UpdatePropertyCache(strict=True) does not remedy this problem. Is there a clever way I can fix this quickly without waiting for this to be fixed in a future version? > > > Illimar Rekand > Ph.D. candidate, > Brenk-lab, Haug-lab > Department of Biomedicine > Department of Chemistry > University of Bergen > > > > From: Illimar Hugo Rekand > Sent: Monday, December 16, 2019 5:55:56 PM > To: Paolo Tosco > Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand > > > Hey, Paolo, > > > thanks for a good and clear example! > > > all the best, > > > Illimar Rekand > Ph.D. candidate, > Brenk-lab, Haug-lab > Department of Biomedicine > Department of Chemistry > University of Bergen > > > > From: Paolo Tosco > Sent: Monday, December 16, 2019 5:52:18 PM > To: Illimar Hugo Rekand; rdkit-discuss@lists.sourceforge.net > Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand > > Hi Illimar, > > this gist: > > https://gist.github.com/ptosco/2ee199b219b27e01052a7a1433b3bd22 > > shows a way to achieve this. > > Cheers, > p. > > On 16/12/2019 16:07, Illimar Hugo Rekand wrote: >> Hello, everyone >> >> >> Is there a simple way to create a mol object from just the HETATM/ligand lines from a pdb-file? >> >> Would it be viable to create a function where you could create a mol object from specific lines within a pdb-file? >> >> >> Illimar Rekand >> Ph.D. candidate, >> Brenk-lab, Haug-lab >> Department of Biomedicine >> Department of Chemistry >> University of Bergen >> >> >> >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand (Dimitri Maziuk)
I had to do something similar on 40,000 PDB files ending up using PDB_tools https://pypi.org/project/pdb-tools/ Cheers, Chris ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
Hi IIllimar, The RDKit PDB reader only recognize standard amino acids and, after the PR I did on Saturday https://github.com/rdkit/rdkit/pull/2850 will be merged, nucleic acid bases. Anything else will not have the correct hybridization/bond orders perceived, as those are not encoded in the PDB format and the PDB reader does not have any functionality to do that. The 1ARJ case is peculiar, as it has an ARG residue which would be recognized if it were in the ATOM records, but not in the HETATM section, for which no attempt to perceive the correct hybridization/bond is made. My suggestion, if you are using standard PDB files, is to download the SDF file: https://www.rcsb.org/pdb/download/downloadLigandFiles.do?ligandIdList=A2F&structIdList=3GOT&instanceType=all&excludeUnobserved=false&includeHydrogens=false and construct your RDKit molecule from that. You should be able to automate that without too much effort either constructing URLs using the template above or using the PDB REST API. Cheers, p. On 16/12/2019 18:24, Illimar Hugo Rekand wrote: Thanks, Paolo, for a good and clear example. I adapted your code into my workflow to calculate some Lipinski-properties of RNA pdb-structures, and ran into some issues. I'm not sure if I should make a new thread or throw this onto this one I already created? I used the following code under from rdkit import Chem from rdkit.Chem import rdmolops, Lipinski from urllib.request import urlopen import gzip import pprint pp = pprint.PrettyPrinter(indent=4) Lipinski_dic = {'FractionCSP3':Lipinski.FractionCSP3, 'HeavyAtomCount':Lipinski.HeavyAtomCount, 'NHOHCount': Lipinski.NHOHCount, "NOCount":Lipinski.NOCount, "NumAliphaticCarbocycles": Lipinski.NumAliphaticCarbocycles, "NumAliphaticHeterocycles" : Lipinski.NumAliphaticHeterocycles, 'NumAliphaticRings' : Lipinski.NumAliphaticRings, 'NumAromaticCarbocycles' : Lipinski.NumAromaticCarbocycles, 'NumAromaticHeterocycles' : Lipinski.NumAromaticHeterocycles, 'NumAromaticRings' : Lipinski.NumAromaticRings, 'NumHAcceptors' : Lipinski.NumHAcceptors, 'NumHDonors' : Lipinski.NumHDonors, 'NumHeteroatoms' : Lipinski.NumHeteroatoms, 'NumRotatableBonds' : Lipinski.NumRotatableBonds, 'NumSaturatedCarbocycles' : Lipinski.NumSaturatedCarbocycles, 'NumSaturatedHeterocycles' : Lipinski.NumSaturatedHeterocycles, 'NumSaturatedRings' : Lipinski.NumSaturatedRings, 'RingCount' : Lipinski.RingCount } url = "https://files.rcsb.org/download/1arj.pdb.gz"; pdb_data = gzip.decompress(urlopen(url).read()) mol = Chem.RWMol(Chem.MolFromPDBBlock(pdb_data)) bonds_to_cleave = {(b.GetBeginAtomIdx(), b.GetEndAtomIdx()) for b in mol.GetBonds() if b.GetBeginAtom().GetPDBResidueInfo().GetIsHeteroAtom() ^ b.GetEndAtom().GetPDBResidueInfo().GetIsHeteroAtom()} [mol.RemoveBond(*b) for b in bonds_to_cleave] hetatm_frags = [f for f in rdmolops.GetMolFrags(mol, asMols=True, sanitizeFrags=True) if f.GetNumAtoms() and f.GetAtomWithIdx(0).GetPDBResidueInfo().GetIsHeteroAtom()] for hetatm in hetatm_frags: res_name = hetatm.GetAtomWithIdx(0).GetPDBResidueInfo().GetResidueName() calculated_props = {} for prop in Lipinski_dic: function = Lipinski_dic[prop] x = function(hetatm) calculated_props[prop] = x pp.pprint(calculated_props) and as you can see the properties of the ligand doesn't match up with what is expected (The number of SP3-atoms doesn't match up). When parsing through the structure 3got, it fails to recognize the aromatic rings of the ligand A2F. I'm assuming this is caused by RDKit not assigning bond orders correctly when reading in RNA and DNA pdb files (something which I have reported in earlier on this mailing list)? Running hetatm.UpdatePropertyCache(strict=True) does not remedy this problem. Is there a clever way I can fix this quickly without waiting for this to be fixed in a future version? Illimar Rekand Ph.D. candidate, Brenk-lab, Haug-lab Department of Biomedicine Department of Chemistry University of Bergen From: Illimar Hugo Rekand Sent: Monday, December 16, 2019 5:55:56 PM To: Paolo Tosco Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand Hey, Paolo, thanks for a good and clear example! all the best, Illimar Rekand Ph.D. candidate, Brenk-lab, Haug-lab Department of Biomedicine Department of Chemistry University of Bergen ____ F
Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
On 12/16/19 10:35 AM, Illimar Hugo Rekand wrote: > Fair point. > > But when working in the 100s and 1000s range of PDB-files it would be nice to > have some fewer steps when designing a pipeline. But what's the selection criteria? NMR structures are usually deposited with 20 models, do you want the ligand from every one? Only from the representative one? There's at least one PDB ID (forget which) with 3 stable conformers, i.e. model 1 is not the representative structure. Structures annotated by PDB will have HETATM instead of ATOM for non-standards and ligands, but if your files haven't been processed by them, all bets are off. And so on -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
Thanks, Paolo, for a good and clear example. I adapted your code into my workflow to calculate some Lipinski-properties of RNA pdb-structures, and ran into some issues. I'm not sure if I should make a new thread or throw this onto this one I already created? I used the following code under from rdkit import Chem from rdkit.Chem import rdmolops, Lipinski from urllib.request import urlopen import gzip import pprint pp = pprint.PrettyPrinter(indent=4) Lipinski_dic = {'FractionCSP3':Lipinski.FractionCSP3, 'HeavyAtomCount':Lipinski.HeavyAtomCount, 'NHOHCount': Lipinski.NHOHCount, "NOCount":Lipinski.NOCount, "NumAliphaticCarbocycles": Lipinski.NumAliphaticCarbocycles, "NumAliphaticHeterocycles" : Lipinski.NumAliphaticHeterocycles, 'NumAliphaticRings' : Lipinski.NumAliphaticRings, 'NumAromaticCarbocycles' : Lipinski.NumAromaticCarbocycles, 'NumAromaticHeterocycles' : Lipinski.NumAromaticHeterocycles, 'NumAromaticRings' : Lipinski.NumAromaticRings, 'NumHAcceptors' : Lipinski.NumHAcceptors, 'NumHDonors' : Lipinski.NumHDonors, 'NumHeteroatoms' : Lipinski.NumHeteroatoms, 'NumRotatableBonds' : Lipinski.NumRotatableBonds, 'NumSaturatedCarbocycles' : Lipinski.NumSaturatedCarbocycles, 'NumSaturatedHeterocycles' : Lipinski.NumSaturatedHeterocycles, 'NumSaturatedRings' : Lipinski.NumSaturatedRings, 'RingCount' : Lipinski.RingCount } url = "https://files.rcsb.org/download/1arj.pdb.gz"; pdb_data = gzip.decompress(urlopen(url).read()) mol = Chem.RWMol(Chem.MolFromPDBBlock(pdb_data)) bonds_to_cleave = {(b.GetBeginAtomIdx(), b.GetEndAtomIdx()) for b in mol.GetBonds() if b.GetBeginAtom().GetPDBResidueInfo().GetIsHeteroAtom() ^ b.GetEndAtom().GetPDBResidueInfo().GetIsHeteroAtom()} [mol.RemoveBond(*b) for b in bonds_to_cleave] hetatm_frags = [f for f in rdmolops.GetMolFrags(mol, asMols=True, sanitizeFrags=True) if f.GetNumAtoms() and f.GetAtomWithIdx(0).GetPDBResidueInfo().GetIsHeteroAtom()] for hetatm in hetatm_frags: res_name = hetatm.GetAtomWithIdx(0).GetPDBResidueInfo().GetResidueName() calculated_props = {} for prop in Lipinski_dic: function = Lipinski_dic[prop] x = function(hetatm) calculated_props[prop] = x pp.pprint(calculated_props) and as you can see the properties of the ligand doesn't match up with what is expected (The number of SP3-atoms doesn't match up). When parsing through the structure 3got, it fails to recognize the aromatic rings of the ligand A2F. I'm assuming this is caused by RDKit not assigning bond orders correctly when reading in RNA and DNA pdb files (something which I have reported in earlier on this mailing list)? Running hetatm.UpdatePropertyCache(strict=True) does not remedy this problem. Is there a clever way I can fix this quickly without waiting for this to be fixed in a future version? Illimar Rekand Ph.D. candidate, Brenk-lab, Haug-lab Department of Biomedicine Department of Chemistry University of Bergen From: Illimar Hugo Rekand Sent: Monday, December 16, 2019 5:55:56 PM To: Paolo Tosco Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand Hey, Paolo, thanks for a good and clear example! all the best, Illimar Rekand Ph.D. candidate, Brenk-lab, Haug-lab Department of Biomedicine Department of Chemistry University of Bergen ____________ From: Paolo Tosco Sent: Monday, December 16, 2019 5:52:18 PM To: Illimar Hugo Rekand; rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand Hi Illimar, this gist: https://gist.github.com/ptosco/2ee199b219b27e01052a7a1433b3bd22 shows a way to achieve this. Cheers, p. On 16/12/2019 16:07, Illimar Hugo Rekand wrote: > Hello, everyone > > > Is there a simple way to create a mol object from just the HETATM/ligand > lines from a pdb-file? > > Would it be viable to create a function where you could create a mol object > from specific lines within a pdb-file? > > > Illimar Rekand > Ph.D. candidate, > Brenk-lab, Haug-lab > Department of Biomedicine > Department of Chemistry > University of Bergen > > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
Hi Illimar, this gist: https://gist.github.com/ptosco/2ee199b219b27e01052a7a1433b3bd22 shows a way to achieve this. Cheers, p. On 16/12/2019 16:07, Illimar Hugo Rekand wrote: Hello, everyone Is there a simple way to create a mol object from just the HETATM/ligand lines from a pdb-file? Would it be viable to create a function where you could create a mol object from specific lines within a pdb-file? Illimar Rekand Ph.D. candidate, Brenk-lab, Haug-lab Department of Biomedicine Department of Chemistry University of Bergen ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
Fair point. But when working in the 100s and 1000s range of PDB-files it would be nice to have some fewer steps when designing a pipeline. Illimar Rekand Ph.D. candidate, Brenk-lab, Haug-lab Department of Biomedicine Department of Chemistry University of Bergen From: Dimitri Maziuk via Rdkit-discuss Sent: Monday, December 16, 2019 5:24:49 PM To: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand On 12/16/2019 10:07 AM, Illimar Hugo Rekand wrote: > Would it be viable to create a function where you could create a mol object > from specific lines within a pdb-file? PDB file is simple text. There's any number of utilities to extract the lines you want, incl. a plain text editor, why spend time on reinventing the wheel? Dima ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Constructing a mol object from a PDB ligand
On 12/16/2019 10:07 AM, Illimar Hugo Rekand wrote: Would it be viable to create a function where you could create a mol object from specific lines within a pdb-file? PDB file is simple text. There's any number of utilities to extract the lines you want, incl. a plain text editor, why spend time on reinventing the wheel? Dima ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss