Hi Greg, thanks for your answer. I also figured that the invariant feature of the AtomPair functions should be well suited. Initially, I thought that the pi-electron and neighbouring atom count would always be added, but this is clearly not the case.
The definition of the pharmacophoric patterns was initially described in the PATTY paper (Bush & Sheridan, J. Chem. Inf. Comput. Sci., 1993) and as far as I know the definitions are still valid. Luckily, there is a SMARTS based version floating around (allegedly based on an early OpenBabel version): http://tripod.nih.gov/files/patty.rules I compared those to the original definitions from the paper and they seem to match and overall make sense. I implemented a quick Typer based on this definition, taking into account the fact that only the first atom in the SMARTS is typed. I attached a ipython notebook containing a small example, including the example molecule from the paper. Btw. is the hashing method used for creating the hashed fingerprint still valid / suitable if only a small number of the maximal numAtomPairFingerprintBits / codeSize is actually used? All the best, Michael On Tue, Jul 14, 2015 at 4:15 AM, Greg Landrum <[email protected]> wrote: > Hi Michael, > > On Sun, Jul 5, 2015 at 2:43 PM, Michael Reutlinger <[email protected]> > wrote: > >> >> I would like to use a machine learning method with the AP and DP >> descriptors as described by Robert Sheridan. >> >> AP descriptors are the 'atom pair' descriptors from Carhart et al. 1985 >> and I think they are already available in RDKIT. >> > > Indeed, they are. > > >> DP 'donor−acceptor pair', called 'BP' in Kearsley et al. 1996, is a >> reduced pharmacophore version of AP. >> >> I would like to know if you think there is a straightforward way to use >> the existing AP functionality (maybe using atomInvariants) to reproduce >> the descriptor as described in Kearsley et al.? >> > > Yes, if you have a way to assign integer invariants (atom types) to atoms > that correspond to the BP features described in Kearsley et al. then it > would be very straightforward to use those in the calculation. Note that > the RDKit atom pair code does require that all atoms have a type (i.e. you > can't have atoms that are ignored like some pharmacophore methods would > do), but looking at the paper it seems like this isn't a problem: any atom > that doesn't get assigned to one of the other classes just gets put into > class 7: "other". > > The paper does not, unfortunately, include enough information to directly > implement the fingerprint: you will need to come up with definitions > (probably SMARTS-based?) for the the 6 atom classes. I've thought several > times about adding the fingerprint-types from the paper to the RDKit (and > then testing them out using Sereina's benchmarking platform), but this has > always ended up getting hung-up on the missing atom-type definitions. > > -greg > > >
Patty.ipynb
Description: Binary data
------------------------------------------------------------------------------ Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

