Hi Thomas,

I agree that this is a much better place to ask a question than in the
comments of my blog post. :-)

The problem you're having here is that the PatternHolder() class assumes
that the fingerprints being used have the default size, so you are storing
fingerprints with 4096 bits, but when the SubstructLibrary generates a
fingerprint for a query molecule it only generates a 2048-bit fingerprint.
This causes the substructure screenout to fail.
This is certainly a bug in the SubstructLibrary (it should, at the very
least, generate an error when you try to do this), but it's easy enough to
fix in your code: just stop specifying the length of the pattern
fingerprints.

Best,
-greg



On Mon, Aug 31, 2020 at 3:57 PM Thomas Evangelidis <teva...@gmail.com>
wrote:

> Greetings,
>
> Maybe I should had posted this query as a comment on Greg's blog post (
> https://rdkit.blogspot.com/2018/02/introducing-substructlibrary.html) but
> I write it here instead for greater visibility. I have many active fragments
> against a protein target (validated by NMR) and I want to screen a very
> large database for molecules containing those fragments. Therefore I
> tried the SubstructLibrary for greater efficiency. However, the results I
> get differ from direct PatternFingerprint comparison and substructure
> search using the Mol object. Try this simple example below:
>
> from rdkit import Chem, DataStructs
> from rdkit.Chem import rdSubstructLibrary
>
> SMILES1 = 'O=C(O)c1cccnc1'
> SMILES2 = 'c1nccc(c1C(=O)O)-c2cc(Cl)ccc2'
> # Remove hydrogens, otherwise you will have to modify the valence of the 
> atoms in the fragment
> # that can facilitate extension by hand
> mol1 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES1, sanitize=False) )
> mol2 = Chem.RemoveHs( Chem.MolFromSmiles(SMILES2, sanitize=False) )
>
> # AVENUE 1: Library
> mols2 = rdSubstructLibrary.CachedTrustedSmilesMolHolder()
> mols2.AddSmiles( Chem.MolToSmiles(mol2) )
> fps = rdSubstructLibrary.PatternHolder()
> fp2 = Chem.PatternFingerprint(mol2, fpSize=4096)
> fps.AddFingerprint( fp2 )
> library = rdSubstructLibrary.SubstructLibrary(mols2, fps)
> print("SubstructLibrary:", library.HasMatch(mol1, useChirality=False) )
>
> # AVENUE 2: PatternFingerprint comparison
> fp1 = Chem.PatternFingerprint(mol1, fpSize=4096)
> print("PatternFingerprint:", DataStructs.AllProbeBitsMatch(fp1, fp2))
>
> # AVENUE 3: HasSubstructMatch
> print("HasSubstructMatch:", mol2.HasSubstructMatch(mol1))
>
>
> I strip out the hydrogens from both molecules in order to avoid manual
> modification of the atoms in the fragment (SMILES1 in this case) that can
> facilitate linking or extension. What is wrong in this case and the results
> do not agree? Am I not using SubstructLibrary correctly?
>
> I thank you in advance.
> Thomas
>
> --
>
> ======================================================================
>
> Dr. Thomas Evangelidis
>
> Research Scientist
>
> IOCB - Institute of Organic Chemistry and Biochemistry of the Czech
> Academy of Sciences <https://www.uochb.cz/web/structure/31.html?lang=en>, 
> Prague,
> Czech Republic
>   &
> CEITEC - Central European Institute of Technology <https://www.ceitec.eu/>
> , Brno, Czech Republic
>
> email: teva...@gmail.com, Twitter: tevangelidis
> <https://twitter.com/tevangelidis>, LinkedIn: Thomas Evangelidis
> <https://www.linkedin.com/in/thomas-evangelidis-495b45125/>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to