Hi Takayuki

The reason why this happens is that the
CreateDifferenceFingerprintForReaction function takes the whole structure
of the molecules of a reactions into account. This means it generates
AtomPair FPs with a path length up to 30 bonds for the reactants and
products and then builds the difference of those. Therefore you get this
low similarity. If you would like to capture the transformation only you
should better use a more local version of the FPs, like an AP FP with a
path length up to 3 bonds or a Morgan FP with radius of 1. Unfortunately
this isn’t possible with the function above but please find an example
below that allows doing this.
I hope that helps.

Best,
Nadine


from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit import DataStructs
import copy


def _createFP(mol,maxSize,fpType='AP'):
    mol.UpdatePropertyCache(False)
    if fpType == 'AP':
        return AllChem.GetAtomPairFingerprint(mol, minLength=1,
maxLength=maxSize)
    else:
        Chem.GetSSSR(mol)
        rinfo = mol.GetRingInfo()
        return AllChem.GetMorganFingerprint(mol, radius=maxSize)

def getSumFps(fps):
    summedFP = copy.deepcopy(fps[0])
    for fp in fps[1:]:
        summedFP += fp
    return summedFP

def buildReactionFP(rxn, maxSize=3, fpType='AP'):
    reactants = rxn.GetReactants()
    products = rxn.GetProducts()
    rFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
reactants])
    pFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
products])
    return pFP-rFP

# Your examples

rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1CCCCC1>>[N:1]C1CCCCC1' )
rxn2 = AllChem.ReactionFromSmarts( '[C:1]C1CCCNC1>>[N:1]C1CCCNC1' )
rxn3 = AllChem.ReactionFromSmarts( '[C:1]c1ccccc1>>[N:1]c1ccccc1' )

rxfp1 = buildReactionFP(rxn1,maxSize=3)
rxfp2 = buildReactionFP(rxn2,maxSize=3)
rxfp3 = buildReactionFP(rxn3,maxSize=3)

tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2)
tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3)
tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3)

print(tc12,tc13,tc23)

>> (0.6666666666666666, 0.0, 0.0)

# Try a smaller path length

rxfp1 = buildReactionFP(rxn1,maxSize=2)
rxfp2 = buildReactionFP(rxn2,maxSize=2)
rxfp3 = buildReactionFP(rxn3,maxSize=2)

tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2)
tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3)
tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3)

print(tc12,tc13,tc23)

>> (1.0, 0.0, 0.0)

# Finally use Morgan with radius 1

rxfp1 = buildReactionFP(rxn1,maxSize=1,fpType='Morgan')
rxfp2 = buildReactionFP(rxn2,maxSize=1,fpType='Morgan')
rxfp3 = buildReactionFP(rxn3,maxSize=1,fpType='Morgan')

tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2)
tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3)
tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3)

print(tc12,tc13,tc23)

>> (1.0, 0.2, 0.2)



2016-07-25 15:44 GMT+02:00 Taka Seri <serit...@gmail.com>:

> Dear rdkitters,
> I want to analyse and build prediction model about reaction or matched
> molecular pair ( molecular transformations ).
>
> I found new function named CreateDifferenceFingerprintForReaction. So, I
> tried to use the function to do it. But I confused following result.
>
> I defined three reactions that transform C to N.
> I expected that tanimoto similarity would be same but Tanimoto similarity
> of the reactions were quite different. I confused these result.
> My code is following....
> from rdkit import Chem
> from rdkit.Chem import AllChem
> from rdkit import rdBase
> from rdkit.Chem import rdChemReactions
> from rdkit.Chem import DataStructs
>
> rdBase.rdkitVersion =>'2016.03.1'
>
> rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1CCCCC1>>[N:1]C1CCCCC1' )
>
> rxn2 = AllChem.ReactionFromSmarts( '[C:1]C1CCCNC1>>[N:1]C1CCCNC1' )
>
> rxn3 = AllChem.ReactionFromSmarts( '[C:1]c1ccccc1>>[N:1]c1ccccc1' )
>
> rxfp1 = rdChemReactions.CreateDifferenceFingerprintForReaction(rxn1)
>
> rxfp2 = rdChemReactions.CreateDifferenceFingerprintForReaction(rxn2)
>
> rxfp3 = rdChemReactions.CreateDifferenceFingerprintForReaction(rxn3)
>
> tc12 = DataStructs.TanimotoSimilarity(rxfp1, rxfp2)
>
> tc13 = DataStructs.TanimotoSimilarity(rxfp1, rxfp3)
>
> tc23 = DataStructs.TanimotoSimilarity(rxfp2, rxfp3)
>
> print( tc12,tc13, tc23 )
>
> # I got following score. Why 2nd and 3rd similarity was zero?
>
> 0.7142857142857143 0.0 0.0
>
> Any advice and suggestions will be greatly appreciated
> Best regards,
> Takayuki
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports.http://sdm.link/zohodev2dev
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to