Hello, In the past, I've had very good experience with the rooted fingerprints. They were introduced by Vulpetti et al. as a description of local environment of fluorine (LEF) atoms. Later on, we used it to compare ionization sites in a Moka retraining study (Gedeck et al.).
The LEF code in the RDkit contrib directory contains the code. Here is a slightly more generic version: def getAtomEnvironmentFP(mol, atom, maxPathLength=7): """ Return the atom environment fingerprints around atom """ fp = Torsions.GetHashedTopologicalTorsionFingerprint(mol, nBits=9192, targetSize=maxPathLength, fromAtoms=[atom]) for i in range(2, maxPathLength): nfp = Torsions.GetHashedTopologicalTorsionFingerprint(mol, nBits=9192, targetSize=i, fromAtoms=[atom]) for bit, v in nfp.GetNonzeroElements().iteritems(): fp[bit] = fp[bit] + v return fp You can modify the number of bits used in hashing and the maximum path length; the values here worked well for the pKa study, in the LEF code, they used maxPathLength=8 and the same number of bits. For comparison of environments use DataStructs.BulkDiceSimilarity or DataStructs.DiceSimilarity. Best, Peter Vulpetti, A.; Hommel, U.; Landrum, G.; Lewis, R.; Dalvit, C. Design and NMR-Based Screening of LEF, a Library of Chemical Fragments with Different Local Environment of Fluorine. J. Am. Chem. Soc. 2009, 131 (36), 12949−12959. Gedeck Peter, Lu Yipin, Skolnik Suzanne, Rodde Stephane, Dollinger Gavin, Jia Weiping, Berellini Guiliano, Faller Bernard, Lombardo Franco. The benefit of retraining pKa studied using internally measured data. J Chem Inf Model 55 (2015) 1449-1459. [DOI: http://dx.doi.org/10.1021/acs.jcim.5b00172] On Mon, Nov 21, 2016 at 12:33 PM Chris Swain <sw...@mac.com> wrote: > Hi, > > Thanks for this, it gives me a start. > > Cheers, > > Chris > > On 21 Nov 2016, at 08:59, Richard Hall <richard.h...@astx.com> wrote: > > > > We've been looking at something similar - the following code spits out a > canonical smiles string for each atom based at a radius of 1...maxradius. > > > > def atomenvironments(mol, atno, maxradius=6): > > for a in mol.GetAtoms(): > > idx = a.GetIdx() > > print atno, idx, 0, a.GetSmarts() > > for iradius in xrange(0, maxradius+1): > > env = Chem.FindAtomEnvironmentOfRadiusN(mol, > iradius, idx) > > amap = {} > > submol=Chem.PathToSubmol(mol, env, atomMap=amap) > > if amap.get(idx) is not None: > > print atno, idx, iradius, > Chem.MolToSmiles(submol, rootedAtAtom=amap[idx], canonical=True) > > > > You can then load the output into a DB for searching. You might be able > to tweak this to suit your purposes? > > > > best wishes > > Richard > > > > -----Original Message----- > > From: Chris Swain [mailto:sw...@mac.com] > > Sent: 20 November 2016 18:44 > > To: rdkit-discuss@lists.sourceforge.net > > Subject: [Rdkit-discuss] Atom Environments > > > > Hi, > > > > I have a project where I would like to find similar atom environments to > a specified atom in a selected molecule. > > > > For example > > > > Suppose I have this query molecule C1CNCC(C1)c1ccccc1, and the selected > atom is the nitrogen. > > > > I also have a file containing SMILES strings and ID for a list of > reference molecules. > > > > I would like to identify the molecule within the references molecules > that contains a nitrogen most similar to the selected atom in the query > molecule even if the rest of the molecule is very different. > > > > My feeling is to start with say a 3 atom radius and if no similar atom > is found above a set similarity to repeat the search using a 2 atom radius, > but to be honest I suspect it will require a bit of trial and error to see > what the optimum radius is? > > > > I'd then want to return the ID of the most similar molecule. > > > > I’ve had a look through the examples but not found anything that close. > > > > Cheers > > > > Chris > > > > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdkit-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > This email and any attachments thereto may contain private, > confidential, and privileged material for the sole use of the intended > recipient. Any review, copying or distribution of this email (or any > attachments thereto) by others is strictly prohibited. If you are not the > intended recipient, please delete the original and any copies of this email > and any attachments thereto and notify the sender immediately. > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss