Guys, my question was how to cast a fingerprint in the form of a binary
array back to the bit vector form, in order to calculate Tanimoto
distances. According to Curt's answer (thanks for that!), I can calculate
the Tanimono simply by using binary arrays. distance.jaccard also works
with numpy arrays (thanks Matthew!).




On 17 March 2017 at 05:05, Greg Landrum <greg.land...@gmail.com> wrote:

> I'm a bit confused by all this. The RDKit has Tanimoto (and a bunch of
> other similarity measures) built in:
>
> In [6]: from rdkit import DataStructs
>
> In [7]: fp1 = rdMolDescriptors.GetMorganFingerprintAsBitVect(
> theobromine,2,2048)
>
> In [8]: fp2 = rdMolDescriptors.GetMorganFingerprintAsBitVect(
> caffeine,2,2048)
>
> In [9]: DataStructs.TanimotoSimilarity(fp1,fp2)
> Out[9]: 0.5294117647058824
>
> Is there a reason that you're interested in using something else?
>
> -greg
>
>
>
> On Thu, Mar 16, 2017 at 9:42 PM, matthew <msedd...@sheffield.ac.uk> wrote:
>
>> I don't think you even need to cast them to numpy arrays if you use
>> scipy. It should be able to take bit arrays. Also, jaccard distance is
>> another name for tanimoto distance. This simplifies the code above:
>>
>> *from __future__ import print_function from rdkit import Chem*
>> *from rdkit.Chem import AllChem*
>>
>>
>> *from scipy.spatial import distance *
>>
>> *mol1 = Chem.MolFromSmiles('CCO')*
>> *mol2 = Chem.MolFromSmiles('CCC')*
>>
>>
>> *fp1 = AllChem.GetMorganFingerprintAsBitVect(mol1, 8) *
>> *fp2 = AllChem.GetMorganFingerprintAsBitVect(mol2, 8)*
>>
>>
>>
>>
>> * # jaccard distance is the same as tanimoto distance # 1 - distance =
>> similarity print(1 - distance.jaccard(fp1, fp2)) *
>>
>> # 0.4285714285714286
>>
>> Matt
>> PhD Student
>> Chemoinformatics Research Group
>> University of Sheffield
>>
>>
>> On 16/03/2017 17:38, Curt Fischer wrote:
>>
>> If you are looking for something quick and dirty, you could stay in numpy
>> to calculate Tanimoto.
>>
>> *from rdkit import Chem*
>> *from rdkit.Chem import AllChem*
>>
>> *import numpy as np*
>> *from __future__ import division*
>>
>> *mol1 = Chem.MolFromSmiles('CCO')*
>> *mol2 = Chem.MolFromSmiles('CCC')*
>>
>> *fp1 = np.array(AllChem.GetMorganFingerprintAsBitVect(mol1, 8),
>> dtype='bool')*
>> *fp2 = np.array(AllChem.GetMorganFingerprintAsBitVect(mol2, 8),
>> dtype='bool')*
>>
>> *def tanimoto(v1, v2):*
>> *    """*
>> *    Calculates tanimoto similarity for two bit vectors*
>> *    """*
>> *    return(np.bitwise_and(v1, v2).sum() / np.bitwise_or(v1, v2).sum())*
>>
>> *tanimoto(fp1, fp2)*
>>
>> *Out[4]:0.42857142857142855*
>>
>>
>> On Thu, Mar 16, 2017 at 7:28 AM, Thomas Evangelidis <teva...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I created a numpyarray from a molecule using the following function:
>>>
>>> AllChem.GetMorganFingerprintAsBitVect()
>>>
>>>
>>> Now I would like to convert back to bit vector the numpy array, in order
>>> to calculate the Tanimoto similarity of two compounds. Is this possible?
>>>
>>> thanks
>>> Thomas
>>>
>>>
>>>
>>> --
>>>
>>> ======================================================================
>>>
>>> Thomas Evangelidis
>>>
>>> Research Specialist
>>> CEITEC - Central European Institute of Technology
>>> Masaryk University
>>> Kamenice 5/A35/1S081,
>>> 62500 Brno, Czech Republic
>>>
>>> email: tev...@pharm.uoa.gr
>>>
>>>           teva...@gmail.com
>>>
>>>
>>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>>
>> _______________________________________________
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


-- 

======================================================================

Thomas Evangelidis

Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic

email: tev...@pharm.uoa.gr

          teva...@gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to