Hi everyone, This issue was solved with Greg off-list.
Turns out that the receptor contains 5 amino acids with AltLoc. 4 of these were cleaned up during preparation of the receptor for docking; the 5th was missed, and that turned out to be the culprit. Cheers Markus From: Greg Landrum <greg.land...@gmail.com> Sent: Thursday, June 6, 2019 8:18 PM To: Mateo Vacacela <mvacac...@cdrd.ca> Cc: rdkit-discuss@lists.sourceforge.net Subject: Re: [Rdkit-discuss] Sanitization Error: Explicit valence greater than permitted for normal protein Hi Mateo, On Thu, Jun 6, 2019 at 6:29 PM Mateo Vacacela <mvacac...@cdrd.ca<mailto:mvacac...@cdrd.ca>> wrote: I’m getting the following error when trying to sanitize a protein from a published pdb file (1E66): ValueError: Sanitization error: Explicit valence for atom # 1254 C, 5, is greater than permitted The error message is telling you what the problem is: there's a carbon atom in the system that has a valence (=number of bonds - charges) of 5. That's illegal for carbon. This type of error typically indicates a problem with the input file. I will note that I downloaded the PDB file for 1E66 from the PDB website and it worked fine for me, so something may have happened to the file you are using? In [6]: import requests In [9]: d = requests.get('https://files.rcsb.org/download/1E66.pdb') In [10]: d.content[:5] Out[10]: b'HEADE' In [12]: with open('1e66.pdb','wb+') as outf: ...: outf.write(d.content) ...: In [13]: m = Chem.MolFromPDBFile('1e66.pdb',sanitize=False,removeHs=False) In [14]: nm = Chem.SanitizeMol(m) In [15]: Here is the script I’m running to recreate the error. I’ve replicated it based off of a script from the deepchem library: This script is very strange. ######## Script Starts ######## import tempfile import os from rdkit import Chem from rdkit.Chem import rdmolops protein_pdb = 'receptor.pdb' with open(protein_pdb) as protein_file: protein_pdb_lines = protein_file.readlines() tempdir = tempfile.mkdtemp() protein_pdb_file = os.path.join(tempdir, "protein.pdb") with open(protein_pdb_file, "w") as protein_f: protein_f.writelines(protein_pdb_lines) This first bit seems to just be copying the file, I'm not sure why you would want to do that. molecule_file = protein_pdb_file my_mol = Chem.MolFromPDBFile(str(molecule_file), sanitize=False, removeHs=False) mol = Chem.SanitizeMol(my_mol) # Error occurs here This doesn't sense. It's shorter (and produces the same result) to just do: mol = Chem.MolFromPDBFile(str(molecule_file), removeHs=False) That will sanitize the structure but leave the Hs. -greg
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss