HI Paolo That's cool thanks. This will also maybe help me in trying to solve my problem of R-group label numbering not taking into account the actual R-group numbering (ie if a molecule has R8 and R5 as sole R-group definitions then they get R1,R2 labels). I also was in contact with Brian Kelley and he suggested to fix it in the underlying codebase so I hope this will be fixed in the next version :) Cheers Nik
From: Paolo Tosco <paolo.tosco.m...@gmail.com> Sent: Thursday, December 13, 2018 11:09 AM To: Stiefl, Nikolaus <nikolaus.sti...@novartis.com>; RDKit Discuss <rdkit-discuss@lists.sourceforge.net> Subject: Re: [Rdkit-discuss] RGroup matching in RGroup decomposition code Hi Nik, There is a way to achieve what you describe, even though it is slightly cumbersome: from rdkit import Chem from rdkit.Chem import rdmolops from rdkit.Chem.Draw import MolsToGridImage, IPythonConsole from rdkit.Chem.rdRGroupDecomposition import ( RGroupDecomposition, RGroupDecompositionParameters) smis = ['Cc1ccnc(O)c1', 'Cc1cc(Cl)ccn1', 'Nc1ccccn1', 'Nc1ccc(Br)cn1', 'c1ccncc1'] mols = [Chem.MolFromSmiles(smi) for smi in smis] MolsToGridImage(mols) [cid:image001.png@01D49312.8A22D520] params = RGroupDecompositionParameters() # rather than using the built-in flag we will manually # adjust the query in two steps using AdjustQueryProperties() params.onlyMatchAtRGroups = False # just atom number the rgroups core1 = Chem.MolFromSmiles('n1ccc([*:2])cc([*:1])1') # make dummies queries core1_params = rdmolops.AdjustQueryParameters() core1_params.makeDummiesQueries = True core1_params.adjustDegree = False core1 = rdmolops.AdjustQueryProperties(core1, core1_params) # change the atoms connected to the dummies into dummies former_atomic_nums = {} for b in core1.GetBonds(): if (b.GetBeginAtom().GetAtomicNum() == 0): a = b.GetEndAtom() elif (b.GetEndAtom().GetAtomicNum() == 0): a = b.GetBeginAtom() else: continue former_atomic_nums[a.GetIdx()] = a.GetAtomicNum() a.SetAtomicNum(0) # this has the same effect as setting onlyMatchAtRGroups to True # but we can avoid applying it the atoms connected to the R groups core1_params.adjustHeavyDegreeFlags = Chem.ADJUST_IGNOREDUMMIES core1_params.makeDummiesQueries = False core1_params.adjustDegree = False core1_params.adjustHeavyDegree = True core1 = rdmolops.AdjustQueryProperties(core1, core1_params) # restore the original atomic numbers for i, an in former_atomic_nums.items(): core1.GetAtomWithIdx(i).SetAtomicNum(an) rg1 = RGroupDecomposition(core1, params) failMols = [] for m in mols: res = rg1.Add(m) if res < 0: failMols.append(m) rg1.Process() True print("FailedMols: %s"%" ".join([Chem.MolToSmiles(m) for m in failMols])) FailedMols: Nc1ccc(Br)cn1 core1 [cid:image002.png@01D49312.8A22D520] d = rg1.GetRGroupsAsColumns(asSmiles=False) MolsToGridImage(d['Core']) [cid:image003.png@01D49312.8A22D520] MolsToGridImage(d['R1']) [cid:image004.png@01D49312.8A22D520] MolsToGridImage(d['R2']) [cid:image005.png@01D49312.8A22D520] Hope that helps, cheers p. On 12/11/18 11:01, Stiefl, Nikolaus wrote: Hi all, I was playing around with the RGroup decomposition code and must say that I am pretty impressed by it. The fact that one can directly work with a MDL R-group file and that the output is a pandasDataFrame makes analysis really slick - well done ! However, one thing that irritates me is the fact that seemingly when I have R-groups defined in my core and enforce matching only at R-groups then molecules having hydrogen atoms in that position are ignored in the "Add" step. I would expect those to be included as long as the molecules don't have additional heavy atoms in positions that are not defined as R-groups in the core. ______________ snip ____________________ from rdkit import Chem from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition, RGroupDecompositionParameters smis = ['Cc1ccnc(O)c1', 'Cc1cc(Cl)ccn1', 'Nc1ccccn1', 'Nc1ccc(Br)cn1', 'c1ccncc1'] mols = [Chem.MolFromSmiles(smi) for smi in smis] params = RGroupDecompositionParameters() params.onlyMatchAtRGroups = True # just atom number the rgroups core1 = Chem.MolFromSmiles('n1ccc([*:2])cc([*:1])1') rg1 = RGroupDecomposition(core1, params) failMols = [] for m in mols: res = rg1.Add(m) if res < 0: failMols.append(m) rg1.Process() print("FailedMols: %s"%" ".join([Chem.MolToSmiles(m) for m in failMols])) ____________ end snip ________________ the output shows that molecules 3-5 are not included at the "Add" step >> FailedMols: Nc1ccccn1 Nc1ccc(Br)cn1 c1ccncc1 For molecules 4 (the 5-bromo substituted aminopyridine) I agree, however I don't understand how I can make sure mols 3 and 5 are also included ... is there a magic parameter that I can set? Cheers Nik _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_rdkit-2Ddiscuss&d=DwMD-g&c=ZbgFmJjg4pdtrnL2HUJUDw&r=ye79geYsJOYow8nmAS-YeajnH05xvpvKYegxy7w7vuo&m=9Fo4M7x0iY_q97UeAPGtEFnDZEoGGq-9PrBQRhWHbAY&s=ZzHmg47DY5D0TZNAcvJKp6KD--CII7D0-oVQmeTCwvo&e=>
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss