For similarity searches (ie - I have a molecule, give me similar molecules in
my database I can look through or test) I would suggest to use the morgan FP's.
This is what those FP's are designed for more or less.
Not sure about substructure searching.
Nik
From: JP [mailto:jeanpaul.ebe...@inhibox.com]
Sent: Tuesday, March 08, 2011 12:13 PM
To: Stiefl, Nikolaus
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Best practice: which (database) fingerprints to
use ?
Yes it does help -- I don't need the 'fuzziness' so I will stick to morganbv.
How do you compare RDkit fp and morgan bv is there any particular scenario
where one outdoes the other?
Many Thanks
JP
On 8 March 2011 10:58, Stiefl, Nikolaus
<nikolaus.sti...@novartis.com<mailto:nikolaus.sti...@novartis.com>> wrote:
Hi JeanPaul,
The difference between featmorganbv and morganbv is that the first one uses
pharmacophore features for atom descriptions whereas the other one atom types
(it essentially corresponds to the ECFP descriptors). I would suggest to use
featmorganbv_fp only if you want to do more fuzzy similarity searching
(scaffold-hopping and the like).
RDKit fingerprint is (as stated below) a daylight fingerprint like FP that is
using hashed molecular subgraphs - it is ok depending on what you want to do
with it - maybe you have a better in-house descriptor that is optimized for
substructure searching though.
Hope that helps
Nik
From: JP
[mailto:jeanpaul.ebe...@inhibox.com<mailto:jeanpaul.ebe...@inhibox.com>]
Sent: Tuesday, March 08, 2011 11:05 AM
To:
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: [Rdkit-discuss] Best practice: which (database) fingerprints to use ?
Hi there,
I am storing a ton of molecules (~8M - it would be a ton if you print them all
out, and hance use all of the trees in Regent's Park) in a database and using
fingerprints for substructure and similarity searches. The fingerprints I am
currently using are the ones I took blindly from the wikipages documentation
(when in doubt, copy) - specifically torsionbv_fp, morganbv_fp and
atompairbv_fp (from http://code.google.com/p/rdkit/wiki/DatabaseCreation2).
Now I look at the database cartridge documentation -
http://code.google.com/p/rdkit/wiki/ReferenceDocumentation and I see there are
others - some of which I have actually heard about:
featmorganbv_fp(mol,int) : returns a bfp which is the bit vector Morgan
fingerprint for a molecule using chemical-feature invariants. The second
argument provides the radius. This is an FCFP-like fingerprint.
rdkit_fp(mol) : returns a bfp which is the RDKit fingerprint for a molecule.
This is a daylight-fingerprint using hashed molecular subgraphs.
What is the best practice here? Is it to use rdkit_fp ? (I assume this was
added later - and possibly the original documentation is out of date)
What is the difference between featmorganbv and the one I am using (i.e.
morganbv_fp) ?
What do you suggest in your experience?
Any ideas will be highly appreciated - as right now I am quite without any
myself.
Many Thanks
JP
------------------------------------------------------------------------------
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss