No worries.This, and Anna's question about similarity searching and clustering 
illustrate a great opportunity for a tutorial on fingerprints and similarity 
searching. 
-greg






On Wed, Nov 23, 2016 at 3:00 PM +0100, "Chris Swain" <sw...@mac.com> wrote:










Thanks for this,
As a chemist who comes from the “cut and paste” school of scripting I’m always 
concerned I’m asking something blindingly obvious
;-)
Chris
On 23 Nov 2016, at 12:36, Greg Landrum <greg.land...@gmail.com> wrote:
[including rdkit-discuss, because it's relevant there and I'm pretty sure Chris 
won't mind and the real Pandas experts may have a better answer than me.]

On Wed, Nov 23, 2016 at 9:51 AM, Chris Swain <sw...@mac.com> wrote:


I quite like storing molecules and associated data in a data frame and I’ve see 
that it is possible to use rdkit for substructure searching, it is possible to 
also do similarity searching?

It's not built in since there are many possible fingerprints that could be used.
It's not quite as convenient as the substructure search, but here's a little 
demo of what you can do to filter based on similarity:
# Start by adding a fingerprint column:In [18]: df['mfp2'] = 
[rdMolDescriptors.GetMorganFingerprintAsBitVect(x,2) for x in df['ROMol']]

# and now filter:In [21]: ndf =df[df.apply(lambda x: 
DataStructs.TanimotoSimilarity(x['mfp2'],qry)>=0.7, axis=1)]
In [23]: len(df)
Out[23]: 1000In [24]: len(ndf)Out[24]: 2
-greg







------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to