Thank you so much!
What I ended up doing follows the same basic idea, although not even close
to the level of detail you put in your program. I'm only comparing the
structures in pairs, and doing the following:
(Sorry for the mess - its part of a larger system I just copied the relevant
parts.)
def scaffold_matching(query_smi, scaff_smi):
"""
Checks if the scaffold from scaff_smi is
contained in the query_smi.
Uses a stringent scaffold test.
"""
sca = Chem.MolFromSmiles(scaff_smi)
que = Chem.MolFromSmiles(query_smi)
match = 0
if que is not None:
maxMatch = sca.GetNumAtoms()
match = rdFMCS.FindMCS([sca,que],
atomCompare=rdFMCS.AtomCompare.CompareAny,
bondCompare=rdFMCS.BondCompare.CompareOrder,
ringMatchesRingOnly=True,
completeRingsOnly=True,
).numAtoms / maxMatch
return match
if __name__ == "__main__":
template_smiles= <SMILES_FOR_SOME_BASE_MOLECULE>
query_smiles=<SMILES_FOR_SOME_QUERY_MOLECULE>
template_mol = Chem.MolFromSmiles(template_smiles)
core = MurckoScaffold.GetScaffoldForMol(template_mol)
scaffold = Chem.MolToSmiles(core)
match = scaffold_matching(query_smiles,scaffold)
--
Gustavo Seabra
From: Andrew Dalke <[email protected]>
Sent: Monday, November 23, 2020 7:59 AM
To: Gustavo Seabra <[email protected]>
Cc: [email protected]
Subject: Re: [Rdkit-discuss] Partial substructure match?
On Nov 19, 2020, at 17:48, Gustavo Seabra <[email protected]
<mailto:[email protected]> > wrote:
Is it possible to search for *partial* substructure matches using RDKit?
...
For example, if the pattern is a naphthalene and the molecule to
search has a benzene, that would count as a 60% match.
A number of people pointed out that RDKit's MCS feature might be
appropriate.
I've attached an example program based around that.
For example, the default is your two structures:
% python mcs_search.py
No --query specified, using naphthalene as the default.
No --target or --targets specified, using phenol as the default.
Target_ID: phenol
nAtoms: 7
nBonds: 7
match_nAtoms: 6
match_nBonds: 6
atom_overlap: 0.600
bond_overlap: 0.545
atom_Tanimoto: 0.545
bond_Tanimoto: 0.500
I'll reverse it by specifying the SMILES on the command-line.
% python mcs_search.py --query 'c1ccccc1O' --target 'c1ccc2ccccc2c1'
Target_ID: query
nAtoms: 10
nBonds: 11
match_nAtoms: 6
match_nBonds: 6
atom_overlap: 0.857
bond_overlap: 0.857
atom_Tanimoto: 0.545
bond_Tanimoto: 0.500
The program includes options to configure the FindMCS() parameters.
In addition, if chemfp 3.x is installed then some additional features are
available, like the following example, which applies the MCS search to all
records in ChEBI:
% python mcs_search.py --query 'COC(=O)C1C(OC(=O)c2ccccc2)CC2CCC1N2C'
--targets ~/databases/ChEBI_lite.sdf.gz --id-tag 'ChEBI ID'
Target_ID nAtoms nBonds match_nAtoms match_nBonds
atom_overlap bond_overlap atom_Tanimoto
bond_Tanimoto
CHEBI:776 21 24 9 8
0.409 0.333 0.265 0.200
CHEBI:1148 7 6 6 5
0.273 0.208 0.261 0.200
CHEBI:1734 19 21 16 15 0.727
0.625 0.640 0.500
CHEBI:1895 9 9 9 8
0.409 0.333 0.409 0.320
...
On Nov 20, 2020, at 15:56, Gustavo Seabra <[email protected]
<mailto:[email protected]> > wrote:
Is it possible to get a partial match with substructure search?
No.
Andrew
[email protected] <mailto:[email protected]>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss