We've been looking at something similar - the following code spits out a canonical smiles string for each atom based at a radius of 1...maxradius.
def atomenvironments(mol, atno, maxradius=6): for a in mol.GetAtoms(): idx = a.GetIdx() print atno, idx, 0, a.GetSmarts() for iradius in xrange(0, maxradius+1): env = Chem.FindAtomEnvironmentOfRadiusN(mol, iradius, idx) amap = {} submol=Chem.PathToSubmol(mol, env, atomMap=amap) if amap.get(idx) is not None: print atno, idx, iradius, Chem.MolToSmiles(submol, rootedAtAtom=amap[idx], canonical=True) You can then load the output into a DB for searching. You might be able to tweak this to suit your purposes? best wishes Richard -----Original Message----- From: Chris Swain [mailto:sw...@mac.com] Sent: 20 November 2016 18:44 To: rdkit-discuss@lists.sourceforge.net Subject: [Rdkit-discuss] Atom Environments Hi, I have a project where I would like to find similar atom environments to a specified atom in a selected molecule. For example Suppose I have this query molecule C1CNCC(C1)c1ccccc1, and the selected atom is the nitrogen. I also have a file containing SMILES strings and ID for a list of reference molecules. I would like to identify the molecule within the references molecules that contains a nitrogen most similar to the selected atom in the query molecule even if the rest of the molecule is very different. My feeling is to start with say a 3 atom radius and if no similar atom is found above a set similarity to repeat the search using a 2 atom radius, but to be honest I suspect it will require a bit of trial and error to see what the optimum radius is? I'd then want to return the ID of the most similar molecule. I’ve had a look through the examples but not found anything that close. Cheers Chris ------------------------------------------------------------------------------ _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss This email and any attachments thereto may contain private, confidential, and privileged material for the sole use of the intended recipient. Any review, copying or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please delete the original and any copies of this email and any attachments thereto and notify the sender immediately. ------------------------------------------------------------------------------ _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss