This RDKit blog post uses some code from Nadine to provide SMARTS strings
corresponding to standard Morgan ("ECFP") bits:
http://rdkit.blogspot.com/2016/03/explaining-morgan-similarity.html

At the moment it's not possible to do the same thing with the new
fingerprinting code (that didn't get finished during GSoC last year).

You can get the raw atom invariants (integers) for the feature morgan
(FCFP-like) fingerprint with rdMolDescriptors.GetFeatureInvariants().
It's not really possible to get a quick explanation of these, but the code
that generates them is here:
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Fingerprints/FingerprintUtil.cpp#L219
Based on this we can write this function:
def explainFeatMorganInvariant(invar):
    # ordering from here:
    #
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Fingerprints/FingerprintUtil.cpp#L182
    feats = ['Donor','Acceptor','Aromatic','Halogen','Basic','Acidic']
    res = []
    for i in range(len(feats)):
        if invar&(1<<i):
            res.append(feats[i])
    return '|'.join(res)

which produces:

invars = rdMolDescriptors.GetFeatureInvariants(Chem.MolFromSmiles('FCCO'))
print(invars)
print([explainFeatMorganInvariant(x) for x in invars])
----
[8, 0, 0, 3]
['Halogen', '', '', 'Donor|Acceptor']

-greg


On Fri, Feb 1, 2019 at 4:11 AM Francois Berenger <mli...@ligand.eu> wrote:

> Hi,
>
> I have a related question:
> how to output the type of an atom in a molecule,
> if possible in a human-readable format; i.e. a human
> readable/understandable string rather than some (obscure) integer.
>
> I am interested to look at the atom types used by the ECFP
> and the FCFP fingerprints.
>
> Thanks a lot,
> Francois.
>
> On 31/01/2019 08:49, Lewis Martin wrote:
> > Thanks so much Greg!
> >
> > If I catch your drift, you are talking about the new fingerprint
> > generators from the google summer of code. I took a look myself since
> > I was curious.
> >
> > Here's a notebook demonstrating how I think it works:
> >
> https://github.com/ljmartin/snippets/blob/master/snippet_fp_with_invariants.ipynb
> > [3]
> > This downloads some bioactivity data from chembl and then compares
> > standard AP or TT fingerprints with same using the atom invariants
> > associated with the MorganFP "Feature" atom typing, which is actually
> > the feature types from the Gobbi/Poppinger paper.  As expected, the
> > invariant versions have higher similarity! It's not CATS but this
> > seems equivalent for my purposes - thanks!
> >
> > Hopefully it's close to the mark - looking forward to seeing other
> > examples too.
> > cheers
> > lewis
> >
> > On Thu, Jan 31, 2019 at 12:03 AM Greg Landrum <greg.land...@gmail.com>
> > wrote:
> >
> >> Hi Lewis,
> >>
> >> This is a great chance to demonstrate some of the things that can be
> >> done with the new fingerprint generation code. It's going to take me
> >> a bit to put this together (it's all new enough that I'm still not
> >> quite "fluent"), but I will try to get an example put together over
> >> the next couple of days.
> >>
> >> -greg
> >>
> >> On Wed, Jan 30, 2019 at 4:59 AM Lewis Martin
> >> <lewis.marti...@gmail.com> wrote:
> >>
> >>> Hi rdkitters,
> >>> I'd like to compare the similarity of torsion/atom pair FPs using
> >>> standard atomic numbering with those using pharmacophore types,
> >>> like the 'CATS' atom typing developed by Gisbert Schneider, and
> >>> hoped someone has some advice here. _CATS_ is a pharmacophore atom
> >>> typing system with these types: H-bond donor, H-bond acceptor,
> >>> positive, negative, lipophilic, and CATS2 has 'aromatic'. These
> >>> are described in: _“Scaffold‐Hopping” by Topological
> >>> Pharmacophore Search: A Contribution to Virtual Screening. _It
> >>> seems pretty close to the Gobbi 2D pharmacophore typing, or the
> >>> features used in FCFP.
> >>>
> >>> Ive no problem detecting the atom types - I borrowed code from the
> >>> open source PyBioMed - but I'm stuck at the next step. How to
> >>> change the atoms into their pharmacophore types to then make a
> >>> torsion or atom pair fingerprint using RDKit? What I've tried so
> >>> far is to just set the atomic number to some series of 5 atoms not
> >>> normally seen in drug like molecules, like 40-44. This is silly
> >>> but it seems to work. The only issue is trouble kekulizing the
> >>> molecules for display. Is there a better way?
> >>>
> >>> Here's a snippet to demonstrate what I mean, it's adapted from
> >>> PyBioMed and any errors are probably mine:
> >>>
> >>
> >
> https://github.com/ljmartin/snippets/blob/master/atom_typing_snippet.ipynb
> >>> [1]
> >>>
> >>> Thanks for your time!
> >>> lewis
> >>>
> >>> _______________________________________________
> >>> Rdkit-discuss mailing list
> >>> Rdkit-discuss@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss [2]
> >
> >
> > Links:
> > ------
> > [1]
> >
> https://github.com/ljmartin/snippets/blob/master/atom_typing_snippet.ipynb
> > [2] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> > [3]
> >
> https://github.com/ljmartin/snippets/blob/master/snippet_fp_with_invariants.ipynb
> >
> > _______________________________________________
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to