Greetings,
Maybe I should had posted this query as a comment on Greg's blog post (
https://rdkit.blogspot.com/2018/02/introducing-substructlibrary.html) but I
write it here instead for greater visibility. I have many active fragments
against a protein target (validated by NMR) and I want to screen a very
large database for molecules containing those fragments. Therefore I tried
the SubstructLibrary for greater efficiency. However, the results I get
differ from direct PatternFingerprint comparison and substructure search
using the Mol object. Try this simple example below:
from rdkit import Chem, DataStructs
from rdkit.Chem import rdSubstructLibrary
SMILES1 = 'O=C(O)c1cccnc1'
SMILES2 = 'c1nccc(c1C(=O)O)-c2cc(Cl)ccc2'
# Remove hydrogens, otherwise you will have to modify the valence of
the atoms in the fragment
# that can facilitate extension by hand
mol1 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES1, sanitize=False) )
mol2 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES2, sanitize=False) )
# AVENUE 1: Library
mols2 = rdSubstructLibrary.CachedTrustedSmilesMolHolder()
mols2.AddSmiles( Chem.MolToSmiles(mol2) )
fps = rdSubstructLibrary.PatternHolder()
fp2 = Chem.PatternFingerprint(mol2, fpSize=4096)
fps.AddFingerprint( fp2 )
library = rdSubstructLibrary.SubstructLibrary(mols2, fps)
print("SubstructLibrary:", library.HasMatch(mol1, useChirality=False) )
# AVENUE 2: PatternFingerprint comparison
fp1 = Chem.PatternFingerprint(mol1, fpSize=4096)
print("PatternFingerprint:", DataStructs.AllProbeBitsMatch(fp1, fp2))
# AVENUE 3: HasSubstructMatch
print("HasSubstructMatch:", mol2.HasSubstructMatch(mol1))
I strip out the hydrogens from both molecules in order to avoid manual
modification of the atoms in the fragment (SMILES1 in this case) that can
facilitate linking or extension. What is wrong in this case and the results
do not agree? Am I not using SubstructLibrary correctly?
I thank you in advance.
Thomas
--
======================================================================
Dr. Thomas Evangelidis
Research Scientist
IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy
of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, Prague,
Czech Republic
&
CEITEC - Central European Institute of Technology
<https://www.ceitec.eu/>, Brno,
Czech Republic
email: [email protected], Twitter: tevangelidis
<https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis
<https://www.linkedin.com/in/thomas-evangelidis-495b45125/>
website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss