Re: [Rdkit-discuss] AP / DP descriptors
Just an FYI on this one: I just merged a Python DP and DT implementation onto master. Here's the github issue referencing the commits: https://github.com/rdkit/rdkit/issues/574 I will try to get a C++ version done in time for the next release. On Wed, Jul 15, 2015 at 11:02 AM, Greg Landrum greg.land...@gmail.com wrote: Hi Michael, On Tuesday, July 14, 2015, Michael Reutlinger rd...@mulchi.de wrote: Hi Greg, thanks for your answer. I also figured that the invariant feature of the AtomPair functions should be well suited. Initially, I thought that the pi-electron and neighbouring atom count would always be added, but this is clearly not the case. It shouldn't be the case for the AP fingerprint. It *does*, however, look like there will be a problem with the TT fingerprints if you provide your own invariants (this is based on a skim of the code, so I'm not sure that's correct). The definition of the pharmacophoric patterns was initially described in the PATTY paper (Bush Sheridan, J. Chem. Inf. Comput. Sci., 1993) and as far as I know the definitions are still valid. Luckily, there is a SMARTS based version floating around (allegedly based on an early OpenBabel version): http://tripod.nih.gov/files/patty.rules I compared those to the original definitions from the paper and they seem to match and overall make sense. Thanks for that pointer. That is really helpful! If I manage to make the time to do so, I will incorporate this into the next RDKit release. I implemented a quick Typer based on this definition, taking into account the fact that only the first atom in the SMARTS is typed. I attached a ipython notebook containing a small example, including the example molecule from the paper. Btw. is the hashing method used for creating the hashed fingerprint still valid / suitable if only a small number of the maximal numAtomPairFingerprintBits / codeSize is actually used? The quick read-through I did of the code makes me believe that it should be fine. I will confirm over the next day or so and let you know. Best, -greg -- ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] AP / DP descriptors
A couple of followups here since I had some time on a plane yesterday. On Wed, Jul 15, 2015 at 11:02 AM, Greg Landrum greg.land...@gmail.com wrote: On Tuesday, July 14, 2015, Michael Reutlinger rd...@mulchi.de wrote: thanks for your answer. I also figured that the invariant feature of the AtomPair functions should be well suited. Initially, I thought that the pi-electron and neighbouring atom count would always be added, but this is clearly not the case. It shouldn't be the case for the AP fingerprint. It *does*, however, look like there will be a problem with the TT fingerprints if you provide your own invariants (this is based on a skim of the code, so I'm not sure that's correct). I skimmed too quickly before writing this response. User-provided atom invariants should also work properly with TT descriptors. The branching terms added in the TT calculation do seem to be properly handed with user-provided invariants. Btw. is the hashing method used for creating the hashed fingerprint still valid / suitable if only a small number of the maximal numAtomPairFingerprintBits / codeSize is actually used? The quick read-through I did of the code makes me believe that it should be fine. I will confirm over the next day or so and let you know. Looks like it should be fine. -greg -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] AP / DP descriptors
Hi Michael, On Tuesday, July 14, 2015, Michael Reutlinger rd...@mulchi.de wrote: Hi Greg, thanks for your answer. I also figured that the invariant feature of the AtomPair functions should be well suited. Initially, I thought that the pi-electron and neighbouring atom count would always be added, but this is clearly not the case. It shouldn't be the case for the AP fingerprint. It *does*, however, look like there will be a problem with the TT fingerprints if you provide your own invariants (this is based on a skim of the code, so I'm not sure that's correct). The definition of the pharmacophoric patterns was initially described in the PATTY paper (Bush Sheridan, J. Chem. Inf. Comput. Sci., 1993) and as far as I know the definitions are still valid. Luckily, there is a SMARTS based version floating around (allegedly based on an early OpenBabel version): http://tripod.nih.gov/files/patty.rules I compared those to the original definitions from the paper and they seem to match and overall make sense. Thanks for that pointer. That is really helpful! If I manage to make the time to do so, I will incorporate this into the next RDKit release. I implemented a quick Typer based on this definition, taking into account the fact that only the first atom in the SMARTS is typed. I attached a ipython notebook containing a small example, including the example molecule from the paper. Btw. is the hashing method used for creating the hashed fingerprint still valid / suitable if only a small number of the maximal numAtomPairFingerprintBits / codeSize is actually used? The quick read-through I did of the code makes me believe that it should be fine. I will confirm over the next day or so and let you know. Best, -greg -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] AP / DP descriptors
Hi Michael, On Sun, Jul 5, 2015 at 2:43 PM, Michael Reutlinger rd...@mulchi.de wrote: I would like to use a machine learning method with the AP and DP descriptors as described by Robert Sheridan. AP descriptors are the 'atom pair' descriptors from Carhart et al. 1985 and I think they are already available in RDKIT. Indeed, they are. DP 'donor−acceptor pair', called 'BP' in Kearsley et al. 1996, is a reduced pharmacophore version of AP. I would like to know if you think there is a straightforward way to use the existing AP functionality (maybe using atomInvariants) to reproduce the descriptor as described in Kearsley et al.? Yes, if you have a way to assign integer invariants (atom types) to atoms that correspond to the BP features described in Kearsley et al. then it would be very straightforward to use those in the calculation. Note that the RDKit atom pair code does require that all atoms have a type (i.e. you can't have atoms that are ignored like some pharmacophore methods would do), but looking at the paper it seems like this isn't a problem: any atom that doesn't get assigned to one of the other classes just gets put into class 7: other. The paper does not, unfortunately, include enough information to directly implement the fingerprint: you will need to come up with definitions (probably SMARTS-based?) for the the 6 atom classes. I've thought several times about adding the fingerprint-types from the paper to the RDKit (and then testing them out using Sereina's benchmarking platform), but this has always ended up getting hung-up on the missing atom-type definitions. -greg -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] AP / DP descriptors
Dear all, I would like to use a machine learning method with the AP and DP descriptors as described by Robert Sheridan. AP descriptors are the 'atom pair' descriptors from Carhart et al. 1985 and I think they are already available in RDKIT. DP 'donor−acceptor pair', called 'BP' in Kearsley et al. 1996, is a reduced pharmacophore version of AP. I would like to know if you think there is a straightforward way to use the existing AP functionality (maybe using atomInvariants) to reproduce the descriptor as described in Kearsley et al.? Best, Michael -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss