Dear Jameed,

On Mon, Mar 11, 2013 at 7:10 PM, Jameed Hussain
<jameed.x.huss...@gsk.com> wrote:
>
<snip>
>
> I remember chatting to you at the UGM about this. It works okay – but it is
> slow (as you need to generate an fp for every atom you need the partial fp
> for) and can suffer from issues related to symmetry. Hence, I was wondering
> if you could add an option/enhancement to the topological fingerprinting
> code.
>
>
>
> Would it be possible to record the bits set for every atom in a given
> molecule as you generate the fingerprint. So something like a dictionary
> keyed on atom id with a value containing an array/set of the bits that get
> set for the atom. So as you hash a path, record the bits that are set to on
> for the ids of the atoms in the path. Hopefully, this isn’t a large piece of
> work.

It's not.

>
> It would make the partial_fp generation much quicker as I would just need to
> generate the fp once and the data structure would contain all the
> information needed to generate the partial fp for any atom/substructure in
> the molecule (without the symmetry issues). It would also have the benefit
> of providing a data structure to explain the bits for the topological
> fingerprint like you have for the Morgan fingerprint. I hope that is enough
> to convince you J.

You had me already... this is just a nice extra bit. :-)

> Lastly, there isn’t an urgency as I have a slow implementation – I just want
> to make it quicker.

I just checked in an initial implementation. This will slow the
fingerprinter down somewhat when you're using the option, but it
shouldn't be that bad compared to the general slowness of the
fingerprinter.

>From the Python side it looks like this:

In [1]: from rdkit import Chem

In [2]: l = []

In [3]: 
fp=Chem.RDKFingerprint(Chem.MolFromSmiles('CCCO'),minPath=1,maxPath=3,nBitsPerHash=1,atomBits=l)

In [4]: list(fp.GetOnBits())
Out[4]: [242, 591, 718, 820, 1485]

In [6]: l
Out[6]:
[[718, 820, 1485],
 [718, 820, 591, 1485],
 [718, 242, 820, 591, 1485],
 [242, 591, 1485]]


-greg

------------------------------------------------------------------------------
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester  
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the  
endpoint security space. For insight on selecting the right partner to 
tackle endpoint security challenges, access the full report. 
http://p.sf.net/sfu/symantec-dev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to