I'm not sure why you'd want to reimplement something that's already there,
but if this works better for you...
the easiest way to get a single function you could call would be to do
something like:
In [18]: def MolToGenericScaffold(mol):
...: return
MurckoScaffold.MakeScaffoldGeneric(MurckoScaffold.GetScaffoldForMol(mol))
...:
In [19]:
Chem.MolToSmiles(MolToGenericScaffold(Chem.MolFromSmiles('CCc1ccc(O)cc1C(=O)C1CC1')))
Out[19]: 'CC(C1CCCCC1)C1CC1'
-greg
On Tue, Apr 27, 2021 at 4:32 AM Francois Berenger <[email protected]> wrote:
> On 27/04/2021 10:12, Francois Berenger wrote:
> > On 26/04/2021 23:35, Greg Landrum wrote:
> >> Hi Francois,
> >>
> >> The implementation which is there does, I believe, the right thing.
> >> However... first you need to find the Murcko Scaffold, then you can
> >> convert that scaffold to the generic form:
> >>
> >>> In [5]: m = Chem.MolFromSmiles('CCc1ccc(O)cc1C(=O)C1CC1')
> >>> In [6]: scaff = MurckoScaffold.GetScaffoldForMol(m)
> >>> In [7]: Chem.MolToSmiles(scaff)
> >>> Out[7]: 'O=C(c1ccccc1)C1CC1'
> >>> In [8]: framework = MurckoScaffold.MakeScaffoldGeneric(scaff)
> >>> In [9]: print(Chem.MolToSmiles(framework))
> >>> CC(C1CCCCC1)C1CC1
> >
> > Ok, maybe this two steps process is a little bit better, but still
> > not exactly what I would expect in some cases.
> >
> > I'll say if I program something which I prefer.
>
> Hello,
>
> I end up with this:
> ---
> def find_terminal_atoms(mol):
> res = []
> for a in mol.GetAtoms():
> if len(a.GetBonds()) == 1:
> res.append(a)
> return res
>
> # Bemis, G. W., & Murcko, M. A. (1996).
> # "The properties of known drugs. 1. Molecular frameworks."
> # Journal of medicinal chemistry, 39(15), 2887-2893.
> def BemisMurckoFramework(mol):
> # keep only Heavy Atoms (HA)
> only_HA = rdkit.Chem.rdmolops.RemoveHs(mol)
> # switch all HA to Carbon
> rw_mol = Chem.RWMol(only_HA)
> for i in range(rw_mol.GetNumAtoms()):
> rw_mol.ReplaceAtom(i, Chem.Atom(6))
> # switch all non single bonds to single
> non_single_bonds = []
> for b in rw_mol.GetBonds():
> if b.GetBondType() != Chem.BondType.SINGLE:
> non_single_bonds.append(b)
> for b in non_single_bonds:
> j = b.GetBeginAtomIdx()
> k = b.GetEndAtomIdx()
> rw_mol.RemoveBond(j, k)
> rw_mol.AddBond(j, k, Chem.BondType.SINGLE)
> # as long as there are terminal atoms, remove them
> terminal_atoms = find_terminal_atoms(rw_mol)
> while terminal_atoms != []:
> for a in terminal_atoms:
> for b in a.GetBonds():
> rw_mol.RemoveBond(b.GetBeginAtomIdx(),
> b.GetEndAtomIdx())
> rw_mol.RemoveAtom(a.GetIdx())
> terminal_atoms = find_terminal_atoms(rw_mol)
> return rw_mol.GetMol()
> ---
>
> I don't claim this is very efficient Python code. I am not very good at
> snake charming.
>
> Regards,
> F.
>
> >> Best,
> >> -greg
> >>
> >> On Mon, Apr 26, 2021 at 11:15 AM Francois Berenger <[email protected]>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> I am trying MurckoScaffold.MakeScaffoldGeneric(mol),
> >>> but this keeps the side chains.
> >>>
> >>> While my understanding of BM scaffolds is that only rings
> >>> and ring linkers should be kept.
> >>>
> >>> The fact that the rdkit implementation keeps the
> >>> side chains makes Murcko scaffolds a much less powerful filter
> >>> to enforce molecular diversity.
> >>>
> >>> And I don't even see any option to force the standard/vanilla
> >>> behavior.
> >>> Or, am I missing something?
> >>>
> >>> Regards,
> >>> F.
> >>>
> >>> _______________________________________________
> >>> Rdkit-discuss mailing list
> >>> [email protected]
> >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
> >
> > _______________________________________________
> > Rdkit-discuss mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss