For similarity searches (ie - I have a molecule, give me similar molecules in 
my database I can look through or test) I would suggest to use the morgan FP's. 
This is what those FP's are designed for more or less.

Not sure about substructure searching.

Nik


From: JP [mailto:jeanpaul.ebe...@inhibox.com]
Sent: Tuesday, March 08, 2011 12:13 PM
To: Stiefl, Nikolaus
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] Best practice: which (database) fingerprints to 
use ?

Yes it does help -- I don't need the 'fuzziness' so I will stick to morganbv.

How do you compare RDkit fp and morgan bv is there any particular scenario 
where one outdoes the other?

Many Thanks
JP


On 8 March 2011 10:58, Stiefl, Nikolaus 
<nikolaus.sti...@novartis.com<mailto:nikolaus.sti...@novartis.com>> wrote:
Hi JeanPaul,

The difference between featmorganbv and morganbv is that the first one uses 
pharmacophore features for atom descriptions whereas the other one atom types 
(it essentially corresponds to the ECFP descriptors). I would suggest to use 
featmorganbv_fp only if you want to do more fuzzy similarity searching 
(scaffold-hopping and the like).

RDKit fingerprint is (as stated below) a daylight fingerprint like FP that is 
using hashed molecular subgraphs - it is ok depending on what you want to do 
with it - maybe you have a better in-house descriptor that  is optimized for 
substructure searching though.

Hope that helps
Nik


From: JP 
[mailto:jeanpaul.ebe...@inhibox.com<mailto:jeanpaul.ebe...@inhibox.com>]
Sent: Tuesday, March 08, 2011 11:05 AM
To: 
rdkit-discuss@lists.sourceforge.net<mailto:rdkit-discuss@lists.sourceforge.net>
Subject: [Rdkit-discuss] Best practice: which (database) fingerprints to use ?


Hi there,

I am storing a ton of molecules (~8M - it would be a ton if you print them all 
out, and hance use all of the trees in Regent's Park) in a database and using 
fingerprints for substructure and similarity searches.  The fingerprints I am 
currently using are the ones I took blindly from the wikipages documentation 
(when in doubt, copy) - specifically torsionbv_fp, morganbv_fp and 
atompairbv_fp (from http://code.google.com/p/rdkit/wiki/DatabaseCreation2).

Now I look at the database cartridge documentation - 
http://code.google.com/p/rdkit/wiki/ReferenceDocumentation and I see there are 
others - some of which I have actually heard about:

featmorganbv_fp(mol,int) : returns a bfp which is the bit vector Morgan 
fingerprint for a molecule using chemical-feature invariants. The second 
argument provides the radius. This is an FCFP-like fingerprint.
rdkit_fp(mol) : returns a bfp which is the RDKit fingerprint for a molecule. 
This is a daylight-fingerprint using hashed molecular subgraphs.

What is the best practice here?  Is it to use rdkit_fp ? (I assume this was 
added later - and possibly the original documentation is out of date)
What is the difference between featmorganbv and the one I am using (i.e. 
morganbv_fp) ?
What do you suggest in your experience?
Any ideas will be highly appreciated - as right now I am quite without any 
myself.

Many Thanks
JP

------------------------------------------------------------------------------
What You Don't Know About Data Connectivity CAN Hurt You
This paper provides an overview of data connectivity, details
its effect on application quality, and explores various alternative
solutions. http://p.sf.net/sfu/progress-d2d
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to